1
0
Commit Graph

2402 Commits

Author SHA1 Message Date
Sivabalan Narayanan
7329d229d5 Adding tests to validate different key generators (#4473) 2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad [HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 (#4498) 2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9 [HUDI-2558] Fixing Clustering w/ sort columns with null values fails (#4404) 2022-01-03 12:19:43 +05:30
Raymond Xu
0273f2e65d [MINOR] Update README.md (#4492)
Update Spark 3 build instructions
2022-01-02 20:34:37 -08:00
YueZhang
1e2d2c437d [HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-02 22:43:30 -05:00
Yann Byron
fe9406dd33 [HUDI-3131] fix ctas error in spark3.1.1 (#4476) 2022-01-02 03:06:55 -08:00
Yann Byron
1622b52c9c [HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (#4490) 2022-01-02 02:42:10 -08:00
leesf
188d0338c4 [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0 (#4488) 2022-01-01 17:38:14 -08:00
Aimiyoo
bfa169d808 [HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341) 2021-12-31 23:38:38 -08:00
YueZhang
ef9923fc55 [HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453)
* constructDropPartitions when drop partitions using jdbc

* done

* done

* code style

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-31 15:56:33 +08:00
Yuwei XIAO
2444f40a4b [HUDI-3095] abstract partition filter logic to enable code reuse (#4454)
* [HUDI-3095] abstract partition filter logic to enable code reuse

* [HUDI-3095] address reviews
2021-12-31 11:07:52 +05:30
yuzhaojing
e88b5fd450 [HUDI-3120] Cache compactionPlan in buffer (#4463)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-31 13:12:32 +08:00
Shawy Geng
a4e622ac61 [HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173)
* [HUDI-2154] Add index key field to HoodieKey

* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-12-30 12:38:26 -08:00
yuzhaojing
0f0088fe4b [HUDI-3124] Bootstrap when timeline have completed instant (#4467)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-30 11:54:34 +08:00
董可伦
436becf3ea [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean (#4016) 2021-12-29 22:53:17 -05:00
Ron
674c149234 [HUDI-3083] Support component data types for flink bulk_insert (#4470)
* [HUDI-3083] Support component data types for flink bulk_insert

* add nested row type test
2021-12-30 11:15:54 +08:00
Sivabalan Narayanan
5c0e4ce005 Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)" (#4465)
This reverts commit 7e7ad1558c.
2021-12-30 10:45:09 +08:00
ForwardXu
504747ecf4 [HUDI-3108] Fix Purge Drop MOR Table Cause error (#4455) 2021-12-29 20:23:23 +08:00
xuzifu666
a29b27c7ca [MINOR] HoodieInstantTimeGenerator improve method used (#4462) 2021-12-29 18:43:16 +08:00
Udit Mehrotra
9412281cb1 [HUDI-2983] Remove Log4j2 transitive dependencies (#4281) 2021-12-28 07:15:05 -08:00
Sivabalan Narayanan
3d7a8695cd Fixing dynamoDbLockConfig required prop check (#4422) 2021-12-28 15:56:30 +05:30
Yann Byron
05942e018c [HUDI-2811] Support Spark 3.2 (#4270) 2021-12-28 00:12:44 -08:00
ForwardXu
32505d5adb [HUDI-3106] Fix HiveSyncTool not sync schema (#4452) 2021-12-27 22:11:14 -08:00
Yann Byron
1f7afba5e4 [HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator (#4416) 2021-12-27 21:39:52 -08:00
harshal
6409fc733d [HUDI-2374] Fixing AvroDFSSource does not use the overridden schema to deserialize Avro binaries (#4353) 2021-12-27 23:01:21 -05:00
ForwardXu
282aa68552 [HUDI-3099] Purge drop partition for spark sql (#4436) 2021-12-28 09:38:26 +08:00
Danny Chan
c81df99e50 [HUDI-3102] Do not store rollback plan in inflight instant (#4445) 2021-12-25 18:10:43 +08:00
Danny Chan
7b07aac286 [HUDI-3101] Excluding compaction instants from pending rollback info (#4443) 2021-12-25 14:10:45 +08:00
xuzifu666
4721073b43 [MINOR] Remove unused method in HoodieActiveTimeline (#4435) 2021-12-24 22:29:34 +08:00
xuzifu666
032b883bd1 [HUDI-3014] Add table option to set utc timezone (#4306) 2021-12-23 16:27:45 +08:00
Aimiyoo
57f43de1ea [MINOR] Fix DedupeSparkJob typo (#4418) 2021-12-22 11:51:26 -08:00
ForwardXu
5d93edc539 [HUDI-3060] drop table for spark sql (#4364) 2021-12-22 19:17:43 +08:00
Sivabalan Narayanan
1a5f8693aa [HUDI-3011] Adding ability to read entire data with HoodieIncrSource with empty checkpoint (#4334)
* Adding ability to read entire data with HoodieIncrSource with empty checkpoint

* Addressing comments
2021-12-22 15:43:06 +05:30
xiarixiaoyao
b5890cd17d Merge pull request #4308 from harsh1231/HUDI-3008
[HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
2021-12-22 16:46:57 +08:00
yuzhaojing
15eb7e81fc [HUDI-2547] Schedule Flink compaction in service (#4254)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-22 15:08:47 +08:00
Danny Chan
f1286c2c76 [HUDI-3032] Do not clean the log files right after compaction for metadata table (#4336) 2021-12-22 11:10:27 +08:00
Aimiyoo
92f54ce3d8 [HUDI-3027] Update hudi-examples README.md (#4330) 2021-12-21 13:36:03 -08:00
harshal patil
7d046f914a [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields 2021-12-21 11:54:52 +05:30
Raymond Xu
32a44bbe06 [HUDI-2970] Add test for archiving replace commit (#4345) 2021-12-21 00:01:59 -05:00
zhangyue19921010
f3f6112b75 [HUDI-3070] Add rerunFailingTestsCount for flakly testes (#4398)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-20 19:59:50 -08:00
Sivabalan Narayanan
982ae3d1eb [MINOR] Increasing CI timeout to 90 mins (#4407) 2021-12-20 20:27:22 -05:00
xuzifu666
f166ddad12 [MINOR] Remove unused method in HoodieActiveTimeline (#4401) 2021-12-20 22:19:37 +08:00
xuzifu666
3ca92108b2 remove unused import (#4349) 2021-12-20 16:32:41 +08:00
Manoj Govindassamy
4a48f99a59 [HUDI-3064][HUDI-3054] FileSystemBasedLockProviderTestClass tryLock fix and TestHoodieClientMultiWriter test fixes (#4384)
- Made FileSystemBasedLockProviderTestClass thread safe and fixed the
   tryLock retry logic.

 - Made TestHoodieClientMultiWriter. testHoodieClientBasicMultiWriter
   deterministic in verifying the HoodieWriteConflictException.
2021-12-19 13:31:02 -05:00
Sivabalan Narayanan
03f71ef1a2 [HUDI-2970] Adding tests for archival of replace commit actions (#4268) 2021-12-18 23:59:39 -08:00
Danny Chan
478f9f3695 [minor] fix NetworkUtils#getHostname (#4355) 2021-12-19 10:09:48 +08:00
Raymond Xu
bb99836841 [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381) 2021-12-18 20:58:51 -05:00
Raymond Xu
f57e28fe39 [MINOR] Azure CI IT tasks clean up (#4337) 2021-12-18 17:00:56 -08:00
Sivabalan Narayanan
77abb5ccb9 [HUDI-3054] Fixing default lock configs for FileSystemBasedLock and fixing a flaky test (#4374) 2021-12-18 16:15:48 -05:00
Sivabalan Narayanan
dc40397fa9 [HUDI-3064] Fixing a bug in TransactionManager and FileSystemTestLock (#4372) 2021-12-18 11:52:11 -05:00