lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Shiyan Xu	eee6a02f77	[HUDI-4456] Clean up test resources (#6203 )	2022-07-25 10:13:06 -05:00
Rahil C	af10a97e7a	[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-22 17:26:16 -07:00
Rahil C	2bf7920bd9	[MINOR] Add logger for HoodieCopyOnWriteTableInputFormat (#6161 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 22:27:18 +05:30
Alexey Kudinkin	a33bdd32e3	[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation (#5470 )	2022-07-21 06:20:47 -07:00
Sivabalan Narayanan	7da97c8096	[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747 ) - When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist.	2022-06-06 15:48:21 -04:00
Heap	47b764ec33	[HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655 )	2022-05-23 15:28:48 -07:00
Alexey Kudinkin	f7544e23ac	[HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns (#5364 ) - Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not - Fallback to append partition values in cases when the source columns are not persisted in data-file - Fixing HoodieBaseRelation incorrectly handling mandatory columns	2022-04-20 19:30:27 +08:00
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
Danny Chan	0281725c6b	[MINOR] Inline the partition path logic into the builder (#5310 )	2022-04-13 16:54:39 +05:30
Sagar Sumit	df87095ef0	[HUDI-3454] Fix partition name in all code paths for LogRecordScanner (#5252 ) * Depend on FSUtils#getRelativePartitionPath(basePath, logFilePath.getParent) to get the partition. * If the list of log file paths in the split is empty, then fallback to usual behaviour.	2022-04-08 09:59:36 +05:30
董可伦	6a8396420c	[HUDI-3643] Fix hive count exception when the table is empty and the path depth is less than 3 (#5051 )	2022-04-07 04:21:03 -07:00
Raymond Xu	e96f08f355	Moving to 0.12.0-SNAPSHOT on master branch.	2022-04-06 15:24:10 +08:00
Prashant Wason	b28f0d6ceb	[HUDI-3290] Different file formats for the partition metadata file. (#5179 ) * [HUDI-3290] Different file formats for the partition metadata file. Partition metadata files are stored in each partition to help identify the base path of a table. These files are saved in the properties file format. Some query engines do not work when non Parquet/ORC files are found in a partition. Added a new table config 'hoodie.partition.metafile.use.data.format' which when enabled (default false for backward compatibility) ensures that partition metafiles will be saved in the same format as the base files of a dataset. For new datasets, the config can be set via hudi-cli. Deltastreamer has a new parameter --partition-metafile-use-data-format which will create a table with this setting. * Code review comments - Adding a new command to migrate from text to base file formats for meta file. - Reimplementing readFromFS() to first read the text format, then base format - Avoid extra exists() checks in readFromFS() - Added unit tests, enabled parquet format across hoodie-hadoop-mr - Code cleanup, restructuring, naming consistency. * Wiring in all the other Spark code paths to respect this config - Turned on parquet meta format for COW data source tests - Removed the deltastreamer command line to keep it shorter * populate HoodiePartitionMetadata#format after readFromFS() Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-04 08:08:20 -07:00
Y Ethan Guo	eaa4c4f2e2	[HUDI-1180] Upgrade HBase to 2.4.9 (#5004 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-03-24 19:04:53 -07:00
Sagar Sumit	fe2c3989e3	[HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117 ) * Remove glob pattern basePath from the deltastreamer tests. * [HUDI-3689] Fix file scheme config for CI failure in TestHoodieRealTimeRecordReader Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-03-24 15:48:35 +05:30
Alexey Kudinkin	099c2c099a	[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877 ) Refactoring Spark DataSource Relations to avoid code duplication. Following Relations were in scope: - BaseFileOnlyViewRelation - MergeOnReadSnapshotRelaation - MergeOnReadIncrementalRelation	2022-03-18 22:32:16 -07:00
冯健	bf191f8d46	[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap (#5028 ) - Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap	2022-03-17 14:20:28 +05:30
Alexey Kudinkin	5e8ff8d793	[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948 )	2022-03-15 10:38:36 -07:00
Aditya Tiwari	051ad0b033	[HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas (#4468 ) * Fixing Hive getSchema for RT tables * Addressing feedback * temp diff * fixing tests after spark datasource read support for metadata table is merged to master * Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com> Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-03-06 07:51:35 +05:30
Raymond Xu	b4362fac45	[HUDI-3348] Add UT to verify HoodieRealtimeFileSplit serde (#4951 )	2022-03-04 11:19:16 +04:00
Raymond Xu	c77b2591d0	[HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860 )	2022-02-26 08:02:12 -05:00
Sagar Sumit	6a5cfb45b9	[MINOR] Fix table type in input format test (#4912 )	2022-02-25 13:51:53 -05:00
Alexey Kudinkin	85e8a5c4de	[HUDI-1296] Support Metadata Table in Spark Datasource (#4789 ) * Bootstrapping initial support for Metadata Table in Spark Datasource - Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication - Bootstrapped HoodieBaseRelation - Updated HoodieMergeOnReadRDD to be able to handle Metadata Table - Modified MOR relations to be able to read different Base File formats (Parquet, HFile)	2022-02-24 16:23:13 -05:00
Alexey Kudinkin	aaddaf524a	[HUDI-3280] Cleaning up Hive-related hierarchies after refactoring (#4743 )	2022-02-16 15:36:37 -08:00
Sivabalan Narayanan	ba4e732ba7	[HUDI-2987] Update all deprecated calls to new apis in HoodieRecordPayload (#4681 )	2022-02-10 19:19:33 -05:00
Alexey Kudinkin	464027ec37	[HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java (#4669 ) Converting BaseHoodieTableFileIndex to Java, removing Scala as a dependency from "hudi-common"	2022-02-09 18:42:08 -05:00
Alexey Kudinkin	973087f385	[HUDI-3276] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat` (#4667 ) Rebased Parquet-based FileInputFormat impls to inherit from MapredParquetInputFormat, to make sure that Hive is appropriately recognizing those impls and applying corresponding optimizations. - Converted HoodieRealtimeFileInputFormatBase and HoodieFileInputFormatBase into standalone implementations that could be instantiated as standalone objects (which could be used for delegation) - Renamed HoodieFileInputFormatBase > HoodieCopyOnWriteTableInputFormat, HoodieRealtimeFileInputFormatBase > HoodieMergeOnReadTableInputFormat - Scaffolded HoodieParquetFileInputFormatBase for all Parquet impls to inherit from - Rebased Parquet impls onto HoodieParquetFileInputFormatBase	2022-02-08 15:21:45 -05:00
Alexey Kudinkin	3f263b82ce	[HUDI-3206] Unify Hive's MOR implementations to avoid duplication (#4559 ) Unify Hive's MOR implementations to avoid duplication to avoid duplication across implementations for different file-formats (Parquet, HFile, etc) - Extracted HoodieRealtimeFileInputFormatBase (extending COW HoodieFileInputFormatBase base) - Rebased Parquet, HFile implementations onto HoodieRealtimeFileInputFormatBase - Tidying up	2022-02-07 14:06:28 -05:00
Y Ethan Guo	b8601a9f58	[HUDI-2656] Generalize HoodieIndex for flexible record data type (#3893 ) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-02-03 20:24:04 -08:00
Alexey Kudinkin	69dfcda116	[HUDI-3191] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s (#4556 )	2022-02-03 14:01:41 -08:00
Alexey Kudinkin	a68e1dc2db	[HUDI-431] Adding support for Parquet in MOR `LogBlock`s (#4333 ) - Adding support for Parquet in MOR tables Log blocks Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2022-02-02 14:35:05 -05:00
Manoj Govindassamy	f87c47352a	[HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field (#4449 ) * [HUDI-2763] Metadata table records - support for key deduplication and virtual keys - The backing log format for the metadata table is HFile, a KeyValue type. Since the key field in the metadata record payload is a duplicate of the Key in the Cell, the redundant key field in the record can be emptied to save on the cost. - HoodieHFileWriter and HoodieHFileDataBlock will now serialize records with the key field emptied by default. HFile writer tries to find if the record has metadata payload schema field 'key' and if so it does the key trimming from the record payload. - HoodieHFileReader when reading the serialized records back from disk, it materializes the missing keyFields if any. HFile reader tries to find if the record has metadata payload schema fiels 'key' and if so it does the key materialization in the record payload. - Tests have been added to verify the default virtual keys and key deduplication support for the metadata table records. Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2022-01-26 13:34:04 -05:00
董可伦	56cd8ffae0	[HUDI-2837] Add support for using database name in incremental query (#4083 )	2022-01-22 22:11:27 -08:00
Alexey Kudinkin	4bea758738	[HUDI-3191] Rebasing Hive's FileInputFormat onto `AbstractHoodieTableFileIndex` (#4531 )	2022-01-18 14:54:51 -08:00
Yuwei XIAO	d36533735f	[HUDI-3194] fix MOR snapshot query during compaction (#4540 )	2022-01-17 17:24:24 -05:00
Alexey Kudinkin	75caa7d3d8	[HUDI-3179] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines (#4520 )	2022-01-16 22:46:20 -08:00
Alexey Kudinkin	6cdcd89afa	[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication (#4417 )	2022-01-11 15:02:13 -08:00
xuzifu666	f0c2912d35	[MINOR] Remove unused methods in HoodieColumnProjectionUtils (#4408 )	2022-01-06 15:36:13 -08:00
Sivabalan Narayanan	a66212d204	[HUDI-2966] Closing LogRecordScanner in compactor (#4478 ) * Closing LogRecordScanner in compactor * Addressing comments	2022-01-05 10:57:18 +08:00
RexAn	f612a20815	[HUDI-2779] Cache BaseDir if HudiTableNotFound Exception thrown (#4014 )	2021-12-09 16:04:11 +05:30
xuzifu666	c9e18d1e7d	[HUDI-2942] add error message log in HoodieCombineHiveInputFormat (#4224 )	2021-12-07 22:05:39 -08:00
xiarixiaoyao	57c4bf8152	[HUDI-2876] for hive/presto hudi should remove the temp file which created by HoodieMergedLogRecordSanner when the query finished. (#4139 )	2021-12-06 21:33:10 +08:00
zhangyue19921010	5616830ae1	Revert "[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests" Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-12-04 08:26:53 +05:30
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Sivabalan Narayanan	8340ccb503	[HUDI-2005] Removing direct fs call in HoodieLogFileReader (#3865 )	2021-11-25 18:51:38 -05:00
Danny Chan	a2eb2b0b0a	[HUDI-2480] FileSlice after pending compaction-requested instant-time… (#3703 ) * [HUDI-2480] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader * include file slice after a pending compaction for spark reader Co-authored-by: garyli1019 <yanjia.gary.li@gmail.com>	2021-11-25 22:30:09 +08:00
Jimmy.Zhou	0d1e7ecdab	[MINOR] Fix typo,'multipe' corrected to 'multiple' (#4068 )	2021-11-22 17:20:23 -08:00
xiarixiaoyao	a0dae41409	[HUDI-2758] remove redundant code in the hoodieRealtimeInputFormatUitls.getRealtimeSplits (#3994 )	2021-11-15 11:29:40 +08:00
xiarixiaoyao	a40ac62e0c	[HUDI-2086]redo the logical of mor_incremental_view for hive (#3203 )	2021-11-10 15:41:07 +08:00
Genmao Yu	f41539a9cb	[HUDI-313] bugfix: NPE when select count start from a realtime table with Tez(#3630 ) Co-authored-by: dylonyu <dylonyu@tencent.com>	2021-11-06 12:16:13 -04:00

1 2 3 4

183 Commits