1
0
Commit Graph

2494 Commits

Author SHA1 Message Date
Danny Chan
b87e95d621 [HUDI-3476] Remove the shade pattern for parquet for flink bundle jar (#4869) 2022-02-22 19:21:57 +08:00
Danny Chan
4affdd0c8f [HUDI-3461] The archived timeline for flink streaming reader should not be reused (#4861)
* Before the patch, the flink streaming reader caches the meta client thus the archived timeline,
  when fetching the instant details from the reused timeline, the exception throws
* Add a method in HoodieTableMetaClient to return a fresh new archived timeline each time
2022-02-22 15:54:29 +08:00
wangxianghu
4d1f74ebea [HUDI-3464] Fix wrong exception thrown from HiveSchemaProvider (#4865) 2022-02-22 10:20:20 +04:00
Sivabalan Narayanan
14dbbdf4c7 [HUDI-2189] Adding delete partitions support to DeltaStreamer (#4787) 2022-02-22 00:01:30 -05:00
Y Ethan Guo
7e1ea06eb9 [MINOR] Fix typos and improve docs in HoodieMetadataConfig (#4867) 2022-02-21 19:36:20 -08:00
Prashant Wason
0dee8edc97 [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present using a config. (#4212)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-02-21 21:53:03 -05:00
Yann Byron
0c950181aa [HUDI-3423] upgrade spark to 3.2.1 (#4815) 2022-02-21 16:52:21 -08:00
RexAn
801fdab55c [HUDI-3042] Abstract Spark update Strategy to make code more clean and remove duplicates (#4845)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-21 06:53:09 -08:00
Pratyaksh Sharma
bf16bc122a [HUDI-349]: Added new cleaning policy based on number of hours (#3646) 2022-02-21 09:04:42 -05:00
Sivabalan Narayanan
d36fe24c9e [HUDI-3455] Fixing checkpoint management in hoodie incr source (#4850) 2022-02-21 08:19:57 -05:00
Sivabalan Narayanan
17cb5cb433 [HUDI-3432] Fixing restore with metadata enabled (#4849)
* Fixing restore with metadata enabled

* Fixing test failures
2022-02-21 18:25:30 +05:30
leesf
76b6ad6491 [HUDI-2732][RFC-38] Spark Datasource V2 Integration (#3964) 2022-02-21 20:14:07 +08:00
YueZhang
359fbfde79 [HUDI-2648] Retry FileSystem action instead of failed directly. (#3887)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-20 15:31:31 -05:00
Raymond Xu
0938f55a2b [HUDI-3458] Fix BulkInsertPartitioner generic type (#4854) 2022-02-20 13:51:58 -05:00
Sivabalan Narayanan
66ac1446dd [MINOR] Moving spark scheduling configs out of DataSourceOptions (#4843) 2022-02-20 13:49:18 -05:00
Bo Cui
83279971a1 [HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords (#4837)
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd [HUDI-3389] fix ColumnarArrayData ClassCastException issue (#4842)
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue

* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04 [HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 (#4823)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-18 08:57:04 -05:00
Y Ethan Guo
fba5822ee3 [HUDI-3430] Fix Deltastreamer to properly shut down the services upon failure (#4824) 2022-02-18 08:44:56 -05:00
luokey
de8161ae96 HoodieSortedMergeHandle#close write data disorder (#4841)
Co-authored-by: 854194341@qq.com <loukey_7821>
2022-02-18 13:31:38 +04:00
Sagar Sumit
ed106f671e [HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties (#4712)
Fix dependency conflict

Fix repairs command

Implement putIfAbsent for DDB lock provider

Add upgrade step and validate while fetching configs

Validate checksum for latest table version only while fetching config

Move generateChecksum to BinaryUtil

Rebase and resolve conflict

Fix table version check
2022-02-18 10:17:06 +05:30
Danny Chan
2844a77b43 [HUDI-3439] Remove the hive shade pattern for flink bundle jar (#4833) 2022-02-17 22:42:39 +08:00
zhangxiang17
433c2573ef [HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' (#4832) 2022-02-17 11:04:09 +08:00
Sagar Sumit
ba0afe1426 [HUDI-3426] Sync datasource clustering config (#4828) 2022-02-16 19:02:49 -05:00
Alexey Kudinkin
aaddaf524a [HUDI-3280] Cleaning up Hive-related hierarchies after refactoring (#4743) 2022-02-16 15:36:37 -08:00
YueZhang
3363c66468 [HUDI-3394] Check isWriteLockedByCurrentThread before unlock for InProcessLockProvider (#4819)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-02-15 22:41:25 -08:00
Y Ethan Guo
9a05940a74 [HUDI-3366] Remove hardcoded logic of disabling metadata table in tests (#4792) 2022-02-15 16:41:47 -05:00
Raymond Xu
538ec44fa8 [HUDI-2931] Add config to disable table services (#4777) 2022-02-15 09:49:53 -05:00
Yann Byron
fe02c64fea fix build & ci (#4822) 2022-02-15 03:40:40 -08:00
Yann Byron
cb6ca7f0d1 [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re… (#4714) 2022-02-14 23:38:38 -05:00
Raymond Xu
27bd7b538e [HUDI-1576] Make archiving an async service (#4795) 2022-02-14 21:15:06 -05:00
Yann Byron
3b401d839c [HUDI-3200] deprecate hoodie.file.index.enable and unify to use BaseFileOnlyViewRelation to handle (#4798) 2022-02-14 17:38:01 -08:00
YueZhang
0a97a9893a [HUDI-3398] Fix TableSchemaResolver for all file formats and metadata table (#4782)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-14 16:02:47 -08:00
Yuqi Gu
e639d99387 [HUDI-1657] Fix the build on aarch64, Fedora 33 (#4617) 2022-02-14 15:10:18 -08:00
Raymond Xu
bcfd8efe66 [MINOR] Prevent async service from starting twice (#4801) 2022-02-14 11:06:31 -08:00
leesf
0db1e978c6 [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2 (#4611) 2022-02-14 06:26:58 -08:00
yuzhaojing
5ca4480a38 [HUDI-3417] Switch AbstractTableFileSystemView#filterBaseFileAfterPendingCompaction log level to debug (#4805)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-14 16:18:34 +08:00
董可伦
94806d5cf7 [HUDI-3272] If mode==ignore && tableExists, do not execute write logic and sync hive (#4632) 2022-02-14 09:22:00 +05:30
RexAn
93ee09fee8 [HUDI-3412] TypedProperties no need to create new set when check key exist or not (#4791)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-14 11:33:29 +08:00
YueZhang
76e2faa28d [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction (#4753)
* use HoodieCommitMetadata to replace writeStatuses computation

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-14 11:12:52 +08:00
冯健
55777fec05 [HUDI-2413] fix Sql source's checkpoint issue (#3648)
* [HUDI-2413] fix Sql source's checkpoint

* Fixing sql source checkpoint handling

* Fixing docs

Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-02-14 08:07:48 +05:30
Y Ethan Guo
6aba00e84f [MINOR] Fix typos in Spark client related classes (#4781) 2022-02-13 06:41:58 -08:00
wangxianghu
ce9762d588 [MINOR] unused import (#4799) 2022-02-12 13:11:37 +04:00
zhangxiang17
9518f78610 [HUDI-3413]fix jackson parse error when empty message from JsonKafkaSource Using HoodieDeltaStreamer (#4794) 2022-02-12 11:37:29 +04:00
satishkotha
89ed6f062e [HUDI-3362] Fix restore to rollback pending clustering operations followed by other rolling back other commits (#4772) 2022-02-11 14:12:45 -05:00
Yann Byron
b431246710 [HUDI-3338] Custom relation instead of HadoopFsRelation (#4709)
Currently, HadoopFsRelation will use the value of the real partition path as the value of the partition field. However, different from the normal table, Hudi will persist the partition value in the parquet file. And in some cases, it's different between the value of the real partition path and the value of the partition field.
So here we implement BaseFileOnlyViewRelation which lets Hudi manage its own relation.
2022-02-11 10:48:44 -08:00
Yann Byron
10474e0962 [HUDI-3402] Set TIMESTAMP_MICROS as the default value for hoodie.parquet.outputtimestamptype (#4749) 2022-02-11 12:23:55 -05:00
Sivabalan Narayanan
ba4e732ba7 [HUDI-2987] Update all deprecated calls to new apis in HoodieRecordPayload (#4681) 2022-02-10 19:19:33 -05:00
Yann Byron
2fe7a3a41f [HUDI-2610] pass the spark version when sync the table created by spark (#4758)
* [HUDI-2610] pass the spark version when sync the table created by spark

* [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig
2022-02-10 21:05:28 +05:30
wenningd
1c778590d1 [HUDI-3395] Allow pass rollbackUsingMarkers to Hudi CLI rollback command (#4557)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-02-10 09:41:22 -05:00