lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sagar Sumit	827549949c	[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203 ) * [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)	2022-01-08 10:22:44 -05:00
Sivabalan Narayanan	2e561defe9	[HUDI-2947] Fixing checkpoint fetch in detlastreamer (#4485 ) * Fixing checkpoint fetch in detlastreamer * Addressing comments	2022-01-07 22:08:58 +05:30
Sivabalan Narayanan	8718c30324	[HUDI-3165] Enabling InProcessLockProvider for all multi-writer tests instead of FileSystemBasedLockProviderTestClass (#4427 )	2022-01-06 13:04:10 -05:00
hehexiaoduantui	50fa5a6aa7	Update HiveIncrementalPuller to configure filesystem (#4431 ) * Update HiveIncrementalPuller.java fix get FileSystem bug * Update HiveIncrementalPuller.java fix error * Update HiveIncrementalPuller.java fie error	2022-01-06 13:19:30 +05:30
Vinish Reddy	eee715b3ff	[HUDI-3168] Fixing null schema with empty commit in incremental relation (#4513 )	2022-01-05 11:43:10 -05:00
harshal	6409fc733d	[HUDI-2374] Fixing AvroDFSSource does not use the overridden schema to deserialize Avro binaries (#4353 )	2021-12-27 23:01:21 -05:00
Sivabalan Narayanan	1a5f8693aa	[HUDI-3011] Adding ability to read entire data with HoodieIncrSource with empty checkpoint (#4334 ) * Adding ability to read entire data with HoodieIncrSource with empty checkpoint * Addressing comments	2021-12-22 15:43:06 +05:30
Raymond Xu	bb99836841	[HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381 )	2021-12-18 20:58:51 -05:00
Sivabalan Narayanan	77abb5ccb9	[HUDI-3054] Fixing default lock configs for FileSystemBasedLock and fixing a flaky test (#4374 )	2021-12-18 16:15:48 -05:00
Sivabalan Narayanan	47852446e8	[HUDI-3043] De-coupling multi writer tests (#4362 )	2021-12-17 21:37:45 -05:00
Sivabalan Narayanan	6eba8345cb	[HUDI-3043] Adding some test fixes to continuous mode multi writer tests (#4356 )	2021-12-17 15:45:05 -05:00
Y Ethan Guo	b5f05fd153	[HUDI-2906] Add a repair util to clean up dangling data and log files (#4278 )	2021-12-11 00:16:05 -08:00
rmahindra123	9797fdfbb2	[HUDI-2974] Make the prefix for metrics name configurable (#4274 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-12-10 19:42:20 -08:00
Yuwei XIAO	f194566ed4	[HUDI-2849] Improve SparkUI job description for write path (#4222 )	2021-12-10 23:22:37 +08:00
Sagar Sumit	c7473a7b0c	[HUDI-2936] Add data count checks in async clustering tests (#4236 )	2021-12-10 09:25:37 -05:00
Sagar Sumit	6dab307e6f	[MINOR] Remove redundant and conflicting spark-hive dependency (#4228 ) Disable TestHiveSchemaProvider	2021-12-06 17:48:32 -08:00
冯健	734c9f5f2d	[HUDI-2418] Support HiveSchemaProvider (#3671 ) Co-authored-by: jian.feng <fengjian428@gmial.com>	2021-12-05 00:10:13 -08:00
ForwardXu	63b15607ff	[HUDI-2937] Introduce a pulsar implementation of hoodie write commit … (#4217 ) * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback	2021-12-05 11:51:06 +04:00
vinoth chandar	36b69d8033	[HUDI-2935] Remove special casing of clustering in deltastreamer checkpoint retrival (#4216 ) - We now seek backwards to find the checkpoint - No need to return empty anymore	2021-12-04 17:16:11 +08:00
Sivabalan Narayanan	e483f7c776	[HUDI-2902] Fixing populate meta fields with Hfile writers and Disabling virtual keys by default for metadata table (#4194 )	2021-12-03 07:20:21 -05:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Manoj Govindassamy	3d75aca40d	[HUDI-2850] Fixing Clustering CLI - schedule and run command fixes to avoid NumberFormatException (#4101 )	2021-11-26 07:17:23 -05:00
Alexey Kudinkin	6f5d8d04cd	[HUDI-2840] Fixed DeltaStreaemer to properly respect configuration passed t/h properties file (#4090 ) * Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion * Fixed `readConfig` to take Hadoop's `Configuration` instead of FS; Fixing usages * Added test for local FS access * Rebase to use `FSUtils.getFs` * Combine properties provided as a file along w/ overrides provided from the CLI * Added helper utilities to `HoodieClusteringConfig`; Make sure corresponding config methods fallback to defaults; * Fixed DeltaStreamer usage to respect properly combined configuration; Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties` * Tidying up * `lint` * Reverting changes to `HoodieWriteConfig` * Tdiying up * Fixed incorrect merge of the props * Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties` * Fixed compilation * Fixed compilation	2021-11-25 14:48:22 -08:00
Sivabalan Narayanan	6a0f079866	[HUDI-2858] Fixing handling of cluster update reject exception in deltastreamer (#4120 )	2021-11-26 01:04:07 +05:30
satishm	264e1ce63c	[HUDI-1290] fixing mysql debezium source (#4119 )	2021-11-25 11:26:59 -05:00
rmahindra123	83f8ed2ae3	[HUDI-1290] Add Debezium Source for deltastreamer (#4063 ) * add source for postgres debezium * Add tests for debezium payload * Fix test * Fix test * Add tests for debezium source * Add tests for debezium source * Fix schema for debezium * Fix checkstyle issues * Fix config issue for schema registry * Add mysql source for debezium * Fix checkstyle issues an tests * Improve code for merging toasted values * Improve code for merging toasted values Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-11-24 17:57:02 -08:00
Y Ethan Guo	bef373fa1d	[MINOR] Fix build failure due to checkstyle issues (#4111 )	2021-11-24 17:17:46 -08:00
Sivabalan Narayanan	435ea1543c	[HUDI-2793] Fixing deltastreamer checkpoint fetch/copy over (#4034 ) - Removed the copy over logic in transaction utils. Deltastreamer will go back to previous commits and get the checkpoint value.	2021-11-24 18:26:40 -05:00
Y Ethan Guo	ca9bfa2a40	[HUDI-2332] Add clustering and compaction in Kafka Connect Sink (#3857 ) * [HUDI-2332] Add clustering and compaction in Kafka Connect Sink * Disable validation check on instant time for compaction and adjust configs * Add javadocs * Add clustering and compaction config * Fix transaction causing missing records in the target table * Add debugging logs * Fix kafka offset sync in participant * Adjust how clustering and compaction are configured in kafka-connect * Fix clustering strategy * Remove irrelevant changes from other published PRs * Update clustering logic and others * Update README * Fix test failures * Fix indentation * Fix clustering config * Add JavaCustomColumnsSortPartitioner and make async compaction enabled by default * Add test for JavaCustomColumnsSortPartitioner * Add more changes after IDE sync * Update README with clarification * Fix clustering logic after rebasing * Remove unrelated changes	2021-11-23 14:23:28 +05:30
Y Ethan Guo	772af935d5	[HUDI-2737] Use earliest instant by default for async compaction and clustering jobs (#3991 ) Address review comments Fix test failures Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2021-11-23 06:49:41 +05:30
Sivabalan Narayanan	fc9ca6a07a	[HUDI-2559] Converting commit timestamp format to millisecs (#4024 ) - Adds support for generating commit timestamps with millisecs granularity. - Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.	2021-11-22 11:44:38 -05:00
Sagar Sumit	89452063b4	[MINOR] Fix instant parsing in HoodieClusteringJob (#4071 )	2021-11-22 08:57:44 -05:00
zhangyue19921010	a2c91a7a9b	[HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job (#3765 ) * coding finished and need to do uts * add uts * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-11-22 16:30:33 +05:30
董可伦	2533a9cc17	[MINOR] Fix typos (#4053 )	2021-11-21 16:34:59 +08:00
dufeng1010	305d160081	[MINOR] optimize in constructor of inputbatch class (#4040 ) Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>	2021-11-21 10:11:01 +08:00
Harsha Teja Kanna	f4b974ac7b	[HUDI-2742] Added S3 object filter to support multiple S3EventsHoodieIncrSources single S3 meta table (#4025 )	2021-11-20 14:54:21 +05:30
Manoj Govindassamy	459b34240b	[HUDI-2593] Virtual keys support for metadata table (#3968 ) - Metadata table today has virtual keys disabled, thereby populating the metafields for each record written out and increasing the overall storage space used. Hereby adding virtual keys support for metadata table so that metafields are disabled for metadata table records. - Adding a custom KeyGenerator for Metadata table so as to not rely on the default Base/SimpleKeyGenerators which currently look for record key and partition field set in the table config. - AbstractHoodieLogRecordReader's version of processing next data block and createHoodieRecord() will be a generic version and making the derived class HoodieMetadataMergedLogRecordReader take care of the special creation of records from explictly passed in partition names.	2021-11-19 18:11:29 -05:00
wenningd	24def0b30d	[HUDI-2362] Add external config file support (#3416 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-11-18 01:59:26 -08:00
davehagman	dfe3b84715	[HUDI-2579] Make deltastreamer checkpoint state merging more explicit (#3820 ) Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2021-11-09 17:37:59 -05:00
Prashant Wason	b7ee341e14	[HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe. (#2819 )	2021-11-05 09:31:42 -04:00
Sagar Sumit	5b1992a92d	[HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved (#3802 )	2021-10-29 13:09:09 -04:00
Raymond Xu	d8560377c3	[HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter (#3849 ) Remove the logic of using deltastreamer to prep test table. Use fixture (compressed test table) instead.	2021-10-24 21:14:39 -07:00
Raymond Xu	f5d7362ee8	[HUDI-2077] Fix flakiness in TestHoodieDeltaStreamer (#3829 )	2021-10-20 23:57:12 -04:00
zhangyue19921010	e6711b171a	[HUDI-2435][BUG]Fix clustering handle errors (#3666 ) * done * remove unused imports * code reviewed * code reviewed Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-10-12 15:24:48 -07:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
zhangyue19921010	dd1bd62684	[HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource (#3413 ) * add ORCDFSSource to support reading orc file into hudi format && add UTs * remove ununsed import * simplify tes * code review * code review * code review * code review * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-09-29 08:54:12 -07:00
qianchutao	9067657a5f	[HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka (#3715 )	2021-09-28 13:47:15 +08:00
董可伦	36be287121	[MINOR] Fix typo,'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet' (#3717 )	2021-09-26 21:53:39 +08:00
qianchutao	7e887b54d7	[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721 )	2021-09-26 21:52:35 +08:00
zhangyue19921010	2d5ac55195	[HUDI-2355][Bug]Archive service executed after cleaner finished. (#3545 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-09-15 19:00:04 -04:00

1 2 3 4 5 ...

345 Commits