lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Manoj Govindassamy	973f78f5ca	[HUDI-2443] Hudi KVComparator for all HFile writer usages (#3889 ) * [HUDI-2443] Hudi KVComparator for all HFile writer usages - Hudi relies on custom class shading for Hbase's KeyValue.KVComparator to avoid versioning and class loading issues. There are few places which are still using the Hbase's comparator class directly and version upgrades would make them obsolete. Refactoring the HoodieKVComparator and making all HFile writer creation using the same shaded class. * [HUDI-2443] Hudi KVComparator for all HFile writer usages - Moving HoodieKVComparator from common.bootstrap.index to common.util * [HUDI-2443] Hudi KVComparator for all HFile writer usages - Retaining the old HoodieKVComparatorV2 for boostrap case. Adding the new comparator as HoodieKVComparatorV2 to differentiate from the old one. * [HUDI-2443] Hudi KVComparator for all HFile writer usages - Renamed HoodieKVComparatorV2 to HoodieMetadataKVComparator and moved it under the package org.apache.hudi.metadata. * Make comparator classname configurable * Revert new config and address other review comments Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2021-11-24 10:05:36 -08:00
rmahindra123	90f2ea2f12	[HUDI-2671] Fix kafka offset handling in Kafka Connect protocol (#4021 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-11-24 10:03:58 -08:00
Sagar Sumit	9af219b7c1	[HUDI-2688] Claim the next rfc 40 for Hudi connector for Trino (#4105 )	2021-11-24 11:43:37 -05:00
Yann Byron	a234833f0a	[HUDI-2759] extract HoodieCatalogTable to coordinate spark catalog table and hoodie table (#3998 )	2021-11-24 02:12:38 -08:00
Danny Chan	0bb506fa00	[HUDI-2847] Flink metadata table supports virtual keys (#4096 )	2021-11-24 17:34:42 +08:00
Danny Chan	323be33f18	Revert "[HUDI-2799] Fix the classloader of flink write task (#4042 )" (#4069 ) This reverts commit `8281cbf762`.	2021-11-24 12:01:18 +08:00
Yann Byron	0cf2f103e0	[HUDI-2838] refresh table after drop partition (#4084 )	2021-11-23 19:46:48 -08:00
Raymond Xu	5078d29eb4	[HUDI-2818] Fix 2to3 upgrade when set `hoodie.table.keygenerator.class` (#4077 )	2021-11-23 19:43:34 -08:00
Alexey Kudinkin	18cf59507f	[HUDI-2831] Securing usages of `SimpleDateFormat` to be thread-safe (#4073 )	2021-11-23 20:25:11 -05:00
rmahindra123	fbff0799b9	[HUDI-2325] Add hive sync support to kafka connect (#3660 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-11-23 15:48:06 -08:00
董可伦	969a5bf11e	[MINOR] Fix typo,rename 'HooodieAvroDeserializer' to 'HoodieAvroDeserializer' (#4064 )	2021-11-23 19:10:57 +08:00
Y Ethan Guo	ca9bfa2a40	[HUDI-2332] Add clustering and compaction in Kafka Connect Sink (#3857 ) * [HUDI-2332] Add clustering and compaction in Kafka Connect Sink * Disable validation check on instant time for compaction and adjust configs * Add javadocs * Add clustering and compaction config * Fix transaction causing missing records in the target table * Add debugging logs * Fix kafka offset sync in participant * Adjust how clustering and compaction are configured in kafka-connect * Fix clustering strategy * Remove irrelevant changes from other published PRs * Update clustering logic and others * Update README * Fix test failures * Fix indentation * Fix clustering config * Add JavaCustomColumnsSortPartitioner and make async compaction enabled by default * Add test for JavaCustomColumnsSortPartitioner * Add more changes after IDE sync * Update README with clarification * Fix clustering logic after rebasing * Remove unrelated changes	2021-11-23 14:23:28 +05:30
zhangyue19921010	9ed28b1570	[HUDI-2409] Using HBase shaded jars in Hudi presto bundle (#3623 ) * using hbase-shaded-jars-in-hudi-presto-hundle * test * add hudi-common-bundle * code review * code review * code review * code review * test * test Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-11-23 11:25:12 +05:30
xiarixiaoyao	9de9951348	[HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs (#4013 ) * [HUDI-2778] Optimize statistics collection related codes and add more docs for z-order. * add test code for multi-thread parquet footer read	2021-11-22 21:46:02 -08:00
Sagar Sumit	c88c2af8bf	[HUDI-2743] Assume path exists and defer fs.exists() in AbstractTableFileSystemView (#4002 )	2021-11-22 22:13:10 -05:00
Y Ethan Guo	6aa710eae0	[MINOR] Add more configuration to Kafka setup script (#3992 ) * [MINOR] Add more configuration to Kafka setup script * Add option to reuse Kafka topic * Minor fixes to README	2021-11-23 07:33:38 +05:30
Sagar Sumit	e22150fe15	[HUDI-1937] Rollback unfinished replace commit to allow updates (#3869 ) * [HUDI-1937] Rollback unfinished replace commit to allow updates while clustering * Revert and delete requested replacecommit too * Rollback pending clustering instants transactionally * No double locking and add a config to enable rollback * Update config to be clear about rollback only on conflict	2021-11-23 07:29:03 +05:30
Jimmy.Zhou	0d1e7ecdab	[MINOR] Fix typo,'multipe' corrected to 'multiple' (#4068 )	2021-11-22 17:20:23 -08:00
Y Ethan Guo	772af935d5	[HUDI-2737] Use earliest instant by default for async compaction and clustering jobs (#3991 ) Address review comments Fix test failures Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2021-11-23 06:49:41 +05:30
Alexey Kudinkin	3bdab01a49	[HUDI-2550] Expand File-Group candidates list for appending for MOR tables (#3986 )	2021-11-22 19:19:59 -05:00
Sagar Sumit	fe57e9beea	[HUDI-2599] Make addFilesToview and fetchLatestBaseFiles public (#4066 )	2021-11-22 12:23:50 -05:00
Sivabalan Narayanan	fc9ca6a07a	[HUDI-2559] Converting commit timestamp format to millisecs (#4024 ) - Adds support for generating commit timestamps with millisecs granularity. - Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.	2021-11-22 11:44:38 -05:00
Sagar Sumit	89452063b4	[MINOR] Fix instant parsing in HoodieClusteringJob (#4071 )	2021-11-22 08:57:44 -05:00
Manoj Govindassamy	7f3b89fad7	[HUDI-2472] Enabling metadata table for TestHoodieIndex test case (#4045 ) - Enablng the metadata table for testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. This is more of a test issue.	2021-11-22 07:21:24 -05:00
zhangyue19921010	a2c91a7a9b	[HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job (#3765 ) * coding finished and need to do uts * add uts * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-11-22 16:30:33 +05:30
Raymond Xu	02f7ca2b05	[HUDI-1870] Add more Spark CI build tasks (#4022 ) * [HUDI-1870] Add more Spark CI build tasks - build for spark3.0.x - build for spark-shade-unbundle-avro - fix build failures - delete unnecessary assertion for spark 3.0.x - use AvroConversionUtils#convertAvroSchemaToStructType instead of calling SchemaConverters#toSqlType directly to solve the compilation failures with spark-shade-unbundle-avro (#5) Co-authored-by: Yann <biyan900116@gmail.com>	2021-11-22 02:16:45 -08:00
Danny Chan	8281cbf762	[HUDI-2799] Fix the classloader of flink write task (#4042 )	2021-11-22 11:05:05 +08:00
董可伦	2533a9cc17	[MINOR] Fix typos (#4053 )	2021-11-21 16:34:59 +08:00
Nate Radtke	887787e8b9	[HUDI-1932] Update Hive sync timestamp when change detected (#3053 ) * Update Hive sync timestamp when change detected Only update the last commit timestamp on the Hive table when the table schema has changed or a partition is created/updated. When using AWS Glue Data Catalog as the metastore for Hive this will ensure that table versions are substantive (including schema and/or partition changes). Prior to this change when a Hive sync is performed without schema or partition changes the table in the Glue Data Catalog would have a new version published with the only change being the timestamp property. https://issues.apache.org/jira/browse/HUDI-1932 * add conditional sync flag * fix testSyncWithoutDiffs * fix HiveSyncConfig Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2021-11-21 12:11:05 +05:30
Danny Chan	520538b15d	[HUDI-2392] Make flink parquet reader compatible with decimal BINARY encoding (#4057 )	2021-11-21 13:27:18 +08:00
Danny Chan	0411f73c7d	[HUDI-2804] Add option to skip compaction instants for streaming read (#4051 )	2021-11-21 12:38:56 +08:00
leesf	74b59a44ec	[HUDI-2813] Claim RFC number for RFC for spark datasource V2 Integration (#4059 )	2021-11-20 18:59:12 -08:00
dufeng1010	305d160081	[MINOR] optimize in constructor of inputbatch class (#4040 ) Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>	2021-11-21 10:11:01 +08:00
rmahindra123	1a5484d2db	[MINOR] Claim RFC number for RFC for debezium source for deltastreamer (#4047 )	2021-11-21 09:28:48 +08:00
vinoth chandar	ae0c67d9fc	[HUDI-2795] Add mechanism to safely update,delete and recover table properties (#4038 ) * [HUDI-2795] Add mechanism to safely update,delete and recover table properties - Fail safe mechanism, that lets queries succeed off a backup file - Readers who are not upgraded to this version of code will just fail until recovery is done. - Added unit tests that exercises all these scenarios. - Adding CLI for recovery, updation to table command. - [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS * Fixing upgrade/downgrade infrastructure to use new updation method	2021-11-20 08:07:40 -08:00
Harsha Teja Kanna	f4b974ac7b	[HUDI-2742] Added S3 object filter to support multiple S3EventsHoodieIncrSources single S3 meta table (#4025 )	2021-11-20 14:54:21 +05:30
Ron	6cc97cc0c9	Remove the aws packages from hudi flink bundle jar (#4050 )	2021-11-20 11:55:12 +08:00
wenningd	3dc6262437	[HUDI-2242] Add configuration inference logic for few options (#3359 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-11-19 19:38:38 -08:00
Manoj Govindassamy	0230d40b74	[HUDI-2796] Metadata table support for Restore action to first commit (#4039 ) - Adding support for the metadata table to restore to first commit and take proper action for the bootstrap on subequent commits.	2021-11-19 20:02:57 -05:00
Manoj Govindassamy	c8617d9390	[HUDI-2472] Enabling metadata table for TestHoodieMergeOnReadTable and TestHoodieCompactor (#4023 )	2021-11-19 20:02:21 -05:00
Manoj Govindassamy	459b34240b	[HUDI-2593] Virtual keys support for metadata table (#3968 ) - Metadata table today has virtual keys disabled, thereby populating the metafields for each record written out and increasing the overall storage space used. Hereby adding virtual keys support for metadata table so that metafields are disabled for metadata table records. - Adding a custom KeyGenerator for Metadata table so as to not rely on the default Base/SimpleKeyGenerators which currently look for record key and partition field set in the table config. - AbstractHoodieLogRecordReader's version of processing next data block and createHoodieRecord() will be a generic version and making the derived class HoodieMetadataMergedLogRecordReader take care of the special creation of records from explictly passed in partition names.	2021-11-19 18:11:29 -05:00
Sagar Sumit	eba354e922	[HUDI-2731] Make clustering work regardless of whether there are base… (#3970 )	2021-11-19 11:09:08 -05:00
Danny Chan	bf008762df	[HUDI-2798] Fix flink query operation fields (#4041 )	2021-11-19 23:39:37 +08:00
Danny Chan	7a00f867ae	[HUDI-2791] Allows duplicate files for metadata commit (#4033 )	2021-11-19 14:30:17 +08:00
Udit Mehrotra	4e067ca581	[HUDI-2641] Avoid deleting all inflight commits heartbeats while rolling back failed writes (#3956 )	2021-11-18 08:33:50 -05:00
wenningd	24def0b30d	[HUDI-2362] Add external config file support (#3416 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-11-18 01:59:26 -08:00
Danny Chan	8772cec4bd	[HUDI-2790] Fix the changelog mode of HoodieTableSource (#4029 )	2021-11-18 16:40:48 +08:00
Danny Chan	71a2ae0fd6	[HUDI-2789] Flink batch upsert for non partitioned table does not work (#4028 )	2021-11-18 13:59:03 +08:00
Sivabalan Narayanan	2d3f2a3275	[HUDI-2734] Setting default metadata enable as false for Java (#4003 )	2021-11-17 14:43:00 -05:00
Manoj Govindassamy	f715cf607f	[HUDI-2716] InLineFS support for S3FS logs (#3977 )	2021-11-17 13:59:38 -05:00

1 2 3 4 5 ...

2124 Commits