1
0
Commit Graph

296 Commits

Author SHA1 Message Date
Danny Chan
465d553df8 [HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink (#5010) 2022-03-14 14:22:07 +08:00
Danny Chan
ec24407191 [HUDI-3581] Reorganize some clazz for hudi flink (#4983) 2022-03-10 15:55:15 +08:00
Danny Chan
fe53bd2dea [HUDI-2677] Add DFS based message queue for flink writer[part3] (#4961) 2022-03-08 15:43:21 +08:00
Bo
b6bdb46f7f [MINOR][HUDI-3460]Fix HoodieDataSourceITCase
close #4959
2022-03-08 12:18:43 +08:00
todd5167
34bc752853 [HUDI-3573] flink cleanFuntion execute clean on initialization (#4936)
For flink insert overwrite operation, do the cleaning each time before the write.
2022-03-08 11:53:54 +08:00
Sivabalan Narayanan
6a46130037 [HUDI-2761] Fixing timeline server for repeated refreshes (#4812)
* Fixing timeline server for repeated refreshes
2022-03-05 10:04:16 +08:00
Bo Cui
0986d5a01d [HUDI-3460] Add reader merge memory option for flink (#4911)
* flink TM memory Optimization
2022-03-04 19:29:29 +08:00
Danny Chan
1d57bd17c2 [minor] Cosmetic changes following HUDI-3315 (#4934) 2022-03-02 17:44:52 +08:00
Gary Li
10d866f083 [HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer (#4679)
* Support bucket index in Flink writer
* Use record key as default index key
2022-03-02 15:14:44 +08:00
yuzhaojing
3b2da9f138 [HUDI-2631] In CompactFunction, set up the write schema each time with the latest schema (#4000)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-03-02 11:18:17 +08:00
stayrascal
3cfb52c413 [MINOR] fix get builtin function issue from Hudi catalog (#4917) 2022-03-02 11:16:19 +08:00
yuzhaojing
44b8ab6048 [HUDI-3418] Save timeout option for remote RemoteFileSystemView (#4809)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-28 15:16:40 -05:00
Bo Cui
193215201c [MINOR] Change MINI_BATCH_SIZE to 2048 (#4862)
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Raymond Xu
c77b2591d0 [HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860) 2022-02-26 08:02:12 -05:00
Danny Chan
a4ee7463ae [HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline (#4906) 2022-02-25 19:08:51 +08:00
yanenze
943b99775b [HUDI-3488] The flink small file list should exclude file slices with pending compaction (#4893)
# this happens when the async-compaction has been configured

Co-authored-by: yanenze <yanenze@keytop.com.cn>
2022-02-24 14:45:03 +08:00
Danny Chan
4affdd0c8f [HUDI-3461] The archived timeline for flink streaming reader should not be reused (#4861)
* Before the patch, the flink streaming reader caches the meta client thus the archived timeline,
  when fetching the instant details from the reused timeline, the exception throws
* Add a method in HoodieTableMetaClient to return a fresh new archived timeline each time
2022-02-22 15:54:29 +08:00
Bo Cui
83279971a1 [HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords (#4837)
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd [HUDI-3389] fix ColumnarArrayData ClassCastException issue (#4842)
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue

* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04 [HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 (#4823)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-18 08:57:04 -05:00
zhangxiang17
433c2573ef [HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' (#4832) 2022-02-17 11:04:09 +08:00
Alexey Kudinkin
aaddaf524a [HUDI-3280] Cleaning up Hive-related hierarchies after refactoring (#4743) 2022-02-16 15:36:37 -08:00
Raymond Xu
538ec44fa8 [HUDI-2931] Add config to disable table services (#4777) 2022-02-15 09:49:53 -05:00
Yann Byron
cb6ca7f0d1 [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re… (#4714) 2022-02-14 23:38:38 -05:00
YueZhang
76e2faa28d [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction (#4753)
* use HoodieCommitMetadata to replace writeStatuses computation

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-14 11:12:52 +08:00
Danny Chan
b3b44236fe [HUDI-3389] Bump flink version to 1.14.3 (#4776) 2022-02-10 11:32:01 +08:00
ForwardXu
773b317983 [HUDI-2941] Show _hoodie_operation in spark sql results (#4649) 2022-02-07 06:28:13 -08:00
Y Ethan Guo
b8601a9f58 [HUDI-2656] Generalize HoodieIndex for flexible record data type (#3893)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-02-03 20:24:04 -08:00
todd5167
2969fb3835 [HUDI-3233] Make metadata commit synchronous for flink batch
close apache/hudi#4561
2022-01-12 20:22:53 +08:00
Town
4b0111974f [HUDI-3184] hudi-flink support timestamp-micros (#4548)
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Sagar Sumit
827549949c [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
fengli
205e48f53f [HUDI-3132] Minor fixes for HoodieCatalog
close apache/hudi#4486
2022-01-06 11:17:23 +08:00
yuzhaojing
e88b5fd450 [HUDI-3120] Cache compactionPlan in buffer (#4463)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-31 13:12:32 +08:00
yuzhaojing
0f0088fe4b [HUDI-3124] Bootstrap when timeline have completed instant (#4467)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-30 11:54:34 +08:00
Ron
674c149234 [HUDI-3083] Support component data types for flink bulk_insert (#4470)
* [HUDI-3083] Support component data types for flink bulk_insert

* add nested row type test
2021-12-30 11:15:54 +08:00
Sivabalan Narayanan
5c0e4ce005 Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)" (#4465)
This reverts commit 7e7ad1558c.
2021-12-30 10:45:09 +08:00
yuzhaojing
15eb7e81fc [HUDI-2547] Schedule Flink compaction in service (#4254)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-22 15:08:47 +08:00
Danny Chan
d0087d4040 [HUDI-3037] Add back remote view storage config for flink (#4338) 2021-12-17 13:57:53 +08:00
Sivabalan Narayanan
7e7ad1558c [HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)
* Revert "[HUDI-2959] Fix the thread leak of cleaning service (#4252)"
Reverting to unblock CI failure for now. will revisit this with the right fix
2021-12-16 21:51:28 -05:00
Fugle666
29bc5fd912 [HUDI-2996] Flink streaming reader 'skip_compaction' option does not work (#4304)
close apache/hudi#4304
2021-12-14 12:21:09 +08:00
Danny Chan
8dd0444ef9 [HUDI-2984] Implement #close for AbstractTableFileSystemView (#4285) 2021-12-11 16:19:10 +08:00
Danny Chan
2dcb3f0062 [HUDI-2985] Shade jackson for hudi flink bundle jar (#4284) 2021-12-11 14:40:57 +08:00
Danny Chan
9bdcee00c0 [HUDI-2959] Fix the thread leak of cleaning service (#4252) 2021-12-11 12:08:47 +08:00
yuzhaojing
3ad9b121f1 [HUDI-2912] Fix CompactionPlanOperator typo (#4187)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-10 09:32:53 -08:00
Danny Chan
bd08470421 [HUDI-2957] Shade kryo jar for flink bundle jar (#4251) 2021-12-09 10:16:42 +08:00
Danny Chan
e8473b9a2b [HUDI-2951] Disable remote view storage config for flink (#4237) 2021-12-07 18:04:15 +08:00
Ron
a8fb69656f [HUDI-2877] Support flink catalog to help user use flink table conveniently (#4153)
* [HUDI-2877] Support flink catalog to help user use flink table conveniently

* Fix comment

* fix comment2
2021-12-05 10:14:29 +08:00
Danny Chan
0699521f83 [HUDI-2924] Refresh the fs view on successful checkpoints for write profile (#4199) 2021-12-03 16:12:59 +08:00
Danny Chan
f74b3d12aa [minor] Refactor write profile to always generate fs view (#4198) 2021-12-03 11:38:29 +08:00
Danny Chan
934fe54cc5 [HUDI-2914] Fix remote timeline server config for flink (#4191) 2021-12-03 08:59:10 +08:00