Danny Chan
1d57bd17c2
[minor] Cosmetic changes following HUDI-3315 ( #4934 )
2022-03-02 17:44:52 +08:00
Gary Li
10d866f083
[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer ( #4679 )
...
* Support bucket index in Flink writer
* Use record key as default index key
2022-03-02 15:14:44 +08:00
yuzhaojing
3b2da9f138
[HUDI-2631] In CompactFunction, set up the write schema each time with the latest schema ( #4000 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-03-02 11:18:17 +08:00
stayrascal
3cfb52c413
[MINOR] fix get builtin function issue from Hudi catalog ( #4917 )
2022-03-02 11:16:19 +08:00
yuzhaojing
44b8ab6048
[HUDI-3418] Save timeout option for remote RemoteFileSystemView ( #4809 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-02-28 15:16:40 -05:00
Bo Cui
193215201c
[MINOR] Change MINI_BATCH_SIZE to 2048 ( #4862 )
...
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Raymond Xu
c77b2591d0
[HUDI-2439] Remove SparkBoundedInMemoryExecutor ( #4860 )
2022-02-26 08:02:12 -05:00
Danny Chan
a4ee7463ae
[HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline ( #4906 )
2022-02-25 19:08:51 +08:00
yanenze
943b99775b
[HUDI-3488] The flink small file list should exclude file slices with pending compaction ( #4893 )
...
# this happens when the async-compaction has been configured
Co-authored-by: yanenze <yanenze@keytop.com.cn >
2022-02-24 14:45:03 +08:00
Danny Chan
4affdd0c8f
[HUDI-3461] The archived timeline for flink streaming reader should not be reused ( #4861 )
...
* Before the patch, the flink streaming reader caches the meta client thus the archived timeline,
when fetching the instant details from the reused timeline, the exception throws
* Add a method in HoodieTableMetaClient to return a fresh new archived timeline each time
2022-02-22 15:54:29 +08:00
Bo Cui
83279971a1
[HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords ( #4837 )
...
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd
[HUDI-3389] fix ColumnarArrayData ClassCastException issue ( #4842 )
...
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue
* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04
[HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 ( #4823 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-02-18 08:57:04 -05:00
zhangxiang17
433c2573ef
[HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' ( #4832 )
2022-02-17 11:04:09 +08:00
Alexey Kudinkin
aaddaf524a
[HUDI-3280] Cleaning up Hive-related hierarchies after refactoring ( #4743 )
2022-02-16 15:36:37 -08:00
Raymond Xu
538ec44fa8
[HUDI-2931] Add config to disable table services ( #4777 )
2022-02-15 09:49:53 -05:00
Yann Byron
cb6ca7f0d1
[HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re… ( #4714 )
2022-02-14 23:38:38 -05:00
YueZhang
76e2faa28d
[HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction ( #4753 )
...
* use HoodieCommitMetadata to replace writeStatuses computation
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-14 11:12:52 +08:00
Danny Chan
b3b44236fe
[HUDI-3389] Bump flink version to 1.14.3 ( #4776 )
2022-02-10 11:32:01 +08:00
ForwardXu
773b317983
[HUDI-2941] Show _hoodie_operation in spark sql results ( #4649 )
2022-02-07 06:28:13 -08:00
Y Ethan Guo
b8601a9f58
[HUDI-2656] Generalize HoodieIndex for flexible record data type ( #3893 )
...
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-02-03 20:24:04 -08:00
todd5167
2969fb3835
[HUDI-3233] Make metadata commit synchronous for flink batch
...
close apache/hudi#4561
2022-01-12 20:22:53 +08:00
Town
4b0111974f
[HUDI-3184] hudi-flink support timestamp-micros ( #4548 )
...
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Sagar Sumit
827549949c
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator ( #4203 )
...
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
fengli
205e48f53f
[HUDI-3132] Minor fixes for HoodieCatalog
...
close apache/hudi#4486
2022-01-06 11:17:23 +08:00
yuzhaojing
e88b5fd450
[HUDI-3120] Cache compactionPlan in buffer ( #4463 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2021-12-31 13:12:32 +08:00
yuzhaojing
0f0088fe4b
[HUDI-3124] Bootstrap when timeline have completed instant ( #4467 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2021-12-30 11:54:34 +08:00
Ron
674c149234
[HUDI-3083] Support component data types for flink bulk_insert ( #4470 )
...
* [HUDI-3083] Support component data types for flink bulk_insert
* add nested row type test
2021-12-30 11:15:54 +08:00
Sivabalan Narayanan
5c0e4ce005
Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure ( #4343 )" ( #4465 )
...
This reverts commit 7e7ad1558c .
2021-12-30 10:45:09 +08:00
yuzhaojing
15eb7e81fc
[HUDI-2547] Schedule Flink compaction in service ( #4254 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2021-12-22 15:08:47 +08:00
Danny Chan
d0087d4040
[HUDI-3037] Add back remote view storage config for flink ( #4338 )
2021-12-17 13:57:53 +08:00
Sivabalan Narayanan
7e7ad1558c
[HUDI-3043] Revert async cleaner leak commit to unblock CI failure ( #4343 )
...
* Revert "[HUDI-2959] Fix the thread leak of cleaning service (#4252 )"
Reverting to unblock CI failure for now. will revisit this with the right fix
2021-12-16 21:51:28 -05:00
Fugle666
29bc5fd912
[HUDI-2996] Flink streaming reader 'skip_compaction' option does not work ( #4304 )
...
close apache/hudi#4304
2021-12-14 12:21:09 +08:00
Danny Chan
8dd0444ef9
[HUDI-2984] Implement #close for AbstractTableFileSystemView ( #4285 )
2021-12-11 16:19:10 +08:00
Danny Chan
2dcb3f0062
[HUDI-2985] Shade jackson for hudi flink bundle jar ( #4284 )
2021-12-11 14:40:57 +08:00
Danny Chan
9bdcee00c0
[HUDI-2959] Fix the thread leak of cleaning service ( #4252 )
2021-12-11 12:08:47 +08:00
yuzhaojing
3ad9b121f1
[HUDI-2912] Fix CompactionPlanOperator typo ( #4187 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2021-12-10 09:32:53 -08:00
Danny Chan
bd08470421
[HUDI-2957] Shade kryo jar for flink bundle jar ( #4251 )
2021-12-09 10:16:42 +08:00
Danny Chan
e8473b9a2b
[HUDI-2951] Disable remote view storage config for flink ( #4237 )
2021-12-07 18:04:15 +08:00
Ron
a8fb69656f
[HUDI-2877] Support flink catalog to help user use flink table conveniently ( #4153 )
...
* [HUDI-2877] Support flink catalog to help user use flink table conveniently
* Fix comment
* fix comment2
2021-12-05 10:14:29 +08:00
Danny Chan
0699521f83
[HUDI-2924] Refresh the fs view on successful checkpoints for write profile ( #4199 )
2021-12-03 16:12:59 +08:00
Danny Chan
f74b3d12aa
[minor] Refactor write profile to always generate fs view ( #4198 )
2021-12-03 11:38:29 +08:00
Danny Chan
934fe54cc5
[HUDI-2914] Fix remote timeline server config for flink ( #4191 )
2021-12-03 08:59:10 +08:00
yuzhao.cyz
a1d0ff4209
Moving to 0.11.0-SNAPSHOT on master branch.
2021-11-27 17:22:10 +08:00
Danny Chan
e9efbdb63c
[HUDI-2863] Rename option 'hoodie.parquet.page.size' to 'write.parquet.page.size' ( #4128 )
2021-11-26 16:40:53 +08:00
Alexey Kudinkin
6f5d8d04cd
[HUDI-2840] Fixed DeltaStreaemer to properly respect configuration passed t/h properties file ( #4090 )
...
* Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion
* Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
Fixing usages
* Added test for local FS access
* Rebase to use `FSUtils.getFs`
* Combine properties provided as a file along w/ overrides provided from the CLI
* Added helper utilities to `HoodieClusteringConfig`;
Make sure corresponding config methods fallback to defaults;
* Fixed DeltaStreamer usage to respect properly combined configuration;
Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`
* Tidying up
* `lint`
* Reverting changes to `HoodieWriteConfig`
* Tdiying up
* Fixed incorrect merge of the props
* Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`
* Fixed compilation
* Fixed compilation
2021-11-25 14:48:22 -08:00
Danny Chan
a2eb2b0b0a
[HUDI-2480] FileSlice after pending compaction-requested instant-time… ( #3703 )
...
* [HUDI-2480] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader
* include file slice after a pending compaction for spark reader
Co-authored-by: garyli1019 <yanjia.gary.li@gmail.com >
2021-11-25 22:30:09 +08:00
Danny Chan
0bb506fa00
[HUDI-2847] Flink metadata table supports virtual keys ( #4096 )
2021-11-24 17:34:42 +08:00
Danny Chan
323be33f18
Revert "[HUDI-2799] Fix the classloader of flink write task ( #4042 )" ( #4069 )
...
This reverts commit 8281cbf762 .
2021-11-24 12:01:18 +08:00
Sivabalan Narayanan
fc9ca6a07a
[HUDI-2559] Converting commit timestamp format to millisecs ( #4024 )
...
- Adds support for generating commit timestamps with millisecs granularity.
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
2021-11-22 11:44:38 -05:00