lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Alexey Kudinkin	5755ff25a4	[HUDI-2814] Addressing issues w/ Z-order Layout Optimization (#4060 ) * `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`; Moved Z-index helper under `hudi.index.zorder` package * Tidying up `ZOrderingIndexHelper` * Fixing compilation * Fixed index new/original table merging sequence to always prefer values from new index; Cleaned up `HoodieSparkUtils` * Added test for `mergeIndexSql` * Abstracted Z-index name composition w/in `ZOrderingIndexHelper`; * Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference * Properly handle exceptions origination during pruning in `HoodieFileIndex` * Make sure no errors are logged upon encountering `AnalysisException` * Cleaned up Z-index updating sequence; Tidying up comments, java-docs; * Fixed Z-index to properly handle changes of the list of clustered columns * Tidying up * `lint` * Suppressing `JavaDocStyle` first sentence check * Fixed compilation * Fixing incorrect `DecimalType` conversion * Refactored test `TestTableLayoutOptimization` - Added Z-index table composition test (against fixtures) - Separated out GC test; Tidying up * Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON * Scaffolded `DataTypeUtils` to do basic checks of Spark types; Added proper compatibility checking b/w old/new index-tables * Added test for Z-index tables merging * Fixed import being shaded by creating internal `hudi.util` package * Fixed packaging for `TestOptimizeTable` * Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema * Make sure existing Z-index table schema is sync'd to source table's one * Fixed shaded refs * Fixed tests * Fixed type conversion of Parquet provided metadata values into Spark expected schemas * Fixed `composeIndexSchema` utility to propose proper schema * Added more tests for Z-index: - Checking that Z-index table is built correctly - Checking that Z-index tables are merged correctly (during update) * Fixing source table * Fixing tests to read from Parquet w/ proper schema * Refactored `ParquetUtils` utility reading stats from Parquet footers * Fixed incorrect handling of Decimals extracted from Parquet footers * Worked around issues in javac failign to compile stream's collection * Fixed handling of `Date` type * Fixed handling of `DateType` to be parsed as `LocalDate` * Updated fixture; Make sure test loads Z-index fixture using proper schema * Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well) * Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe * Tidying up * Make sure schema is used upon reading to validate input files are in the appropriate format; Tidying up; * Worked around javac (1.8) inability to infer expression type properly * Updated fixtures; Tidying up * Fixing compilation after rebase * Assert clustering have in Z-order layout optimization testing * Tidying up exception messages * XXX * Added test validating Z-index lookup filter correctness * Added more test-cases; Tidying up * Added tests for string expressions * Fixed incorrect Z-index filter lookup translations * Added more test-cases * Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling * Added `-target:jvm-1.8` for `hudi-spark` module * Adding more tests * Added tests for non-indexed columns * Properly handle non-indexed columns by falling back to a re-write of containing expression as `TrueLiteral` instead * Fixed tests * Removing the parquet test files and disabling corresponding tests Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-11-26 10:02:15 -08:00
Alexey Kudinkin	cbcbec4d38	[MINOR] Fixed checkstyle config to be based off Maven root-dir (requires Maven >=3.3.1 to work properly); (#4009 ) Updated README	2021-11-16 21:30:16 -05:00
Shen Hong	236d1b0dec	[HUDI-1439] Remove scala dependency from hudi-client-common (#2306 )	2020-12-11 00:36:37 -08:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Raymond Xu	366bb10d8c	[HUDI-812] Migrate hudi common tests to JUnit 5 (#1590 ) * [HUDI-812] Migrate hudi-common tests to JUnit 5	2020-05-06 19:15:20 +08:00
Suneel Marthi	8c3001363d	HUDI-479: Eliminate or Minimize use of Guava if possible (#1159 )	2020-03-28 03:11:32 -04:00
lamber-ken	4b1b3fc28c	[MINOR] Set info servity for ImportOrder temporarily (#1127 ) - Now we need fix import check error manually, disable the rule temporarily before finding a better solution.	2019-12-24 19:07:04 +08:00
lamber-ken	b284091783	[HUDI-386] Refactor hudi scala checkstyle rules (#1099 )	2019-12-22 07:30:07 +08:00
lamber-ken	2745b7552f	[HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079 )	2019-12-06 12:59:28 +08:00
lamber-ken	c06d89b648	[HUDI-378] Refactor the rest codes based on new ImportOrder code style rule (#1078 )	2019-12-05 17:25:03 +08:00
lamber-ken	b3e0ebbc4a	[checkstyle] Add ConstantName java checkstyle rule (#1066 ) * add SimplifyBooleanExpression java checkstyle rule * collapse empty tags in scalastyle file	2019-12-04 18:59:15 +08:00
谢磊	b77fad39b5	[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule (#1048 ) [HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule	2019-11-27 16:30:37 +08:00
谢磊	212282c8aa	[HUDI-358] Add Java-doc and importOrder checkstyle rule (#1043 ) - import groups are separated by one blank line - org.apache.hudi.* at the top location	2019-11-25 11:36:23 -08:00
谢磊	804e348d0e	[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025 )	2019-11-19 18:44:42 +08:00
lamber-ken	045fa87a3d	[HUDI-330] add EmptyStatement java checkstyle rule	2019-11-13 14:11:11 -08:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
vinoth chandar	7a973a6944	[HUDI-159] Redesigning bundles for lighter-weight integrations - Documented principles applied for redesign at packaging/README.md - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred - Introduce new FileIOUtils & added checkstyle rule for illegal import of above - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles - 6.5x reduced number of classes across bundles	2019-09-11 11:08:27 -07:00
leesf	8b150a3c6b	[HUDI-230] Add missing Apache License in some files	2019-08-30 09:38:28 -07:00
vinoyang	8f5e7ad5d9	[HUDI-205] Let checkstyle ban Java and Guava Optional instead of using Option provided by Hudi (#834 )	2019-08-13 17:13:52 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00

21 Commits