1
0
Commit Graph

2328 Commits

Author SHA1 Message Date
Thinking Chen
0d8ca8da4e [HUDI-3104] Kafka-connect support of hadoop config environments and properties (#4451) 2022-01-08 23:10:17 -08:00
Sivabalan Narayanan
98ec215079 [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data (#4530)
- There is a chance that the actual write eventually failed in data table but commit was successful in Metadata table, and if compaction was triggered in MDT, compaction could have included the uncommitted data. But once compacted, it may never be ignored while reading from metadata table. So, this patch fixes the bug. Metadata table compaction is triggered before applying the commit to metadata table to circumvent this issue.
2022-01-08 10:34:47 -05:00
Sagar Sumit
46bb00e4df [HUDI-3139] Shade htrace and parquet-avro in presto bundle (#4495)
Filter out unnecessary classes
2022-01-08 10:29:36 -05:00
Sagar Sumit
827549949c [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
Yann Byron
03a83ffeb5 [HUDI-3195] optimize spark3 pom and modify build command (#4538) 2022-01-07 23:21:39 -08:00
董可伦
4f6cdd73a3 [HUDI-3192] Spark metastore schema evolution broken (#4533) 2022-01-08 10:48:37 +08:00
Sagar Sumit
518488c633 [HUDI-3185] HoodieConfig#getBoolean should return false when default not set (#4536)
Remove unnecessary config
2022-01-07 16:20:11 -05:00
Sivabalan Narayanan
2e561defe9 [HUDI-2947] Fixing checkpoint fetch in detlastreamer (#4485)
* Fixing checkpoint fetch in detlastreamer

* Addressing comments
2022-01-07 22:08:58 +05:30
董可伦
b1df60672b [MINOR] fix typos in DDLExecutor (#4534) 2022-01-07 07:59:55 -05:00
Y Ethan Guo
76a72641f1 [HUDI-3188] Update quick start guide for Kafka Connect Sink for Hudi (#4527) 2022-01-07 07:56:08 -05:00
Raymond Xu
2467c137e4 [HUDI-3100] Add config for hive conditional sync (#4440) 2022-01-06 23:26:35 -08:00
YueZhang
b2b23f5d3a [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter (#4521)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-06 21:16:29 -05:00
Thinking Chen
d7afc58d0c [HUDI-3118] Add default HUDI_DIR in setupKafka.sh (#4460) 2022-01-06 15:46:51 -08:00
xuzifu666
f0c2912d35 [MINOR] Remove unused methods in HoodieColumnProjectionUtils (#4408) 2022-01-06 15:36:13 -08:00
Sivabalan Narayanan
8718c30324 [HUDI-3165] Enabling InProcessLockProvider for all multi-writer tests instead of FileSystemBasedLockProviderTestClass (#4427) 2022-01-06 13:04:10 -05:00
Sivabalan Narayanan
2954027b92 [HUDI-52] Enabling savepoint and restore for MOR table (#4507)
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
2022-01-06 21:26:08 +05:30
Sivabalan Narayanan
b6891d253f [HUDI-44] Adding support to preserve commit metadata for compaction (#4428) 2022-01-06 20:27:37 +05:30
hehexiaoduantui
50fa5a6aa7 Update HiveIncrementalPuller to configure filesystem (#4431)
* Update HiveIncrementalPuller.java

fix get FileSystem bug

* Update HiveIncrementalPuller.java

fix error

* Update HiveIncrementalPuller.java

fie error
2022-01-06 13:19:30 +05:30
fengli
205e48f53f [HUDI-3132] Minor fixes for HoodieCatalog
close apache/hudi#4486
2022-01-06 11:17:23 +08:00
Vinish Reddy
eee715b3ff [HUDI-3168] Fixing null schema with empty commit in incremental relation (#4513) 2022-01-05 11:43:10 -05:00
Sagar Sumit
75133f9942 [HUDI-3170] Do not preserve filename when preserveCommitMetadata enabled (#4512) 2022-01-05 08:09:58 -05:00
Danny Chan
0e297c0c4c [HUDI-3171] Sync empty table to hive metastore (#4511) 2022-01-05 16:41:33 +08:00
Sivabalan Narayanan
a66212d204 [HUDI-2966] Closing LogRecordScanner in compactor (#4478)
* Closing LogRecordScanner in compactor

* Addressing comments
2022-01-05 10:57:18 +08:00
Nicolas Paris
37b15ff458 [HUDI-3147] Add endpoint_url to dynamodb lock provider (#4500)
Co-authored-by: Nicolas Paris <nicolas.paris@adevinta.com>
2022-01-04 16:42:28 -05:00
Manoj Govindassamy
bf4e3d63e7 [HUDI-3141] Metadata merged log record reader - avoiding NullPointerException when records by keys (#4505)
- HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
   are not thread safe. When multiple queries come in for gettting log records
   by keys, they all operate on the same log record reader instance provided by
   HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
   as they clear/put/get the same class memeber records.

 - The fix is to streamline the mutatation to class member records. Making
   HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
to avoid concurrent log records readers getting into NPE.
2022-01-04 16:41:33 -05:00
Sagar Sumit
aaf5727495 [HUDI-2774] Handle duplicate instants when fetching pending clustering plans (#4118) 2022-01-04 16:32:05 -05:00
Sivabalan Narayanan
7329d229d5 Adding tests to validate different key generators (#4473) 2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad [HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 (#4498) 2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9 [HUDI-2558] Fixing Clustering w/ sort columns with null values fails (#4404) 2022-01-03 12:19:43 +05:30
Raymond Xu
0273f2e65d [MINOR] Update README.md (#4492)
Update Spark 3 build instructions
2022-01-02 20:34:37 -08:00
YueZhang
1e2d2c437d [HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-02 22:43:30 -05:00
Yann Byron
fe9406dd33 [HUDI-3131] fix ctas error in spark3.1.1 (#4476) 2022-01-02 03:06:55 -08:00
Yann Byron
1622b52c9c [HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (#4490) 2022-01-02 02:42:10 -08:00
leesf
188d0338c4 [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0 (#4488) 2022-01-01 17:38:14 -08:00
Aimiyoo
bfa169d808 [HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341) 2021-12-31 23:38:38 -08:00
YueZhang
ef9923fc55 [HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453)
* constructDropPartitions when drop partitions using jdbc

* done

* done

* code style

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-31 15:56:33 +08:00
Yuwei XIAO
2444f40a4b [HUDI-3095] abstract partition filter logic to enable code reuse (#4454)
* [HUDI-3095] abstract partition filter logic to enable code reuse

* [HUDI-3095] address reviews
2021-12-31 11:07:52 +05:30
yuzhaojing
e88b5fd450 [HUDI-3120] Cache compactionPlan in buffer (#4463)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-31 13:12:32 +08:00
Shawy Geng
a4e622ac61 [HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173)
* [HUDI-2154] Add index key field to HoodieKey

* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-12-30 12:38:26 -08:00
yuzhaojing
0f0088fe4b [HUDI-3124] Bootstrap when timeline have completed instant (#4467)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-30 11:54:34 +08:00
董可伦
436becf3ea [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean (#4016) 2021-12-29 22:53:17 -05:00
Ron
674c149234 [HUDI-3083] Support component data types for flink bulk_insert (#4470)
* [HUDI-3083] Support component data types for flink bulk_insert

* add nested row type test
2021-12-30 11:15:54 +08:00
Sivabalan Narayanan
5c0e4ce005 Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)" (#4465)
This reverts commit 7e7ad1558c.
2021-12-30 10:45:09 +08:00
ForwardXu
504747ecf4 [HUDI-3108] Fix Purge Drop MOR Table Cause error (#4455) 2021-12-29 20:23:23 +08:00
xuzifu666
a29b27c7ca [MINOR] HoodieInstantTimeGenerator improve method used (#4462) 2021-12-29 18:43:16 +08:00
Udit Mehrotra
9412281cb1 [HUDI-2983] Remove Log4j2 transitive dependencies (#4281) 2021-12-28 07:15:05 -08:00
Sivabalan Narayanan
3d7a8695cd Fixing dynamoDbLockConfig required prop check (#4422) 2021-12-28 15:56:30 +05:30
Yann Byron
05942e018c [HUDI-2811] Support Spark 3.2 (#4270) 2021-12-28 00:12:44 -08:00
ForwardXu
32505d5adb [HUDI-3106] Fix HiveSyncTool not sync schema (#4452) 2021-12-27 22:11:14 -08:00
Yann Byron
1f7afba5e4 [HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator (#4416) 2021-12-27 21:39:52 -08:00