pengzhiwei
2c910ee3af
[HUDI-2212] Missing PrimaryKey In Hoodie Properties For CTAS Table ( #3332 )
2021-07-23 15:21:57 +08:00
Danny Chan
c89bf1de20
[HUDI-2205] Rollback inflight compaction for flink writer ( #3320 )
2021-07-22 22:56:51 +08:00
Danny Chan
858e84b5b2
[HUDI-2198] Clean and reset the bootstrap events for coordinator when task failover ( #3304 )
2021-07-21 10:13:05 +08:00
yuzhaojing
634163a990
[HUDI-2145] Create new bucket when NewFileAssignState filled ( #3258 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-20 17:46:45 +08:00
喻兆靖
2099bf41db
[HUDI-2193] Remove state in BootstrapFunction
2021-07-19 18:14:06 +08:00
yuzhao.cyz
50c2b76d72
Revert "[HUDI-2087] Support Append only in Flink stream ( #3252 )"
...
This reverts commit 783c9cb3
2021-07-16 21:36:27 +08:00
yuzhao.cyz
c8aaf00819
[HUDI-2185] Remove the default parallelism of index bootstrap and bucket assigner
2021-07-16 15:44:15 +08:00
vinoyang
52524b659d
[HUDI-2165] Support Transformer for HoodieFlinkStreamer ( #3270 )
...
* [HUDI-2165] Support Transformer for HoodieFlinkStreamer
2021-07-14 23:01:52 +08:00
yuzhaojing
783c9cb369
[HUDI-2087] Support Append only in Flink stream ( #3252 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-10 14:49:35 +08:00
vinoyang
7c6eebf98c
[MINOR] Fix some wrong assert reasons ( #3248 )
2021-07-10 14:35:40 +08:00
vinoth chandar
b4562e86e4
Revert "[HUDI-2087] Support Append only in Flink stream ( #3174 )" ( #3251 )
...
This reverts commit 371526789d .
2021-07-09 11:20:09 -07:00
yuzhaojing
371526789d
[HUDI-2087] Support Append only in Flink stream ( #3174 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-09 16:06:32 +08:00
wangxianghu
f2621da32f
[HUDI-2093] Fix empty avro schema path caused by duplicate parameters ( #3177 )
...
* [HUDI-2093] Fix empty avro schema path caused by duplicate parameters
* rename shcmea option key
* fix doc
* rename var name
2021-07-06 15:14:30 +08:00
Danny Chan
32bd8ce088
[HUDI-2132] Make coordinator events as POJO for efficient serialization ( #3223 )
2021-07-06 09:02:38 +08:00
Danny Chan
e6ee7bdb51
[HUDI-2129] StreamerUtil.medianInstantTime should return a valid date time string ( #3221 )
2021-07-05 20:56:24 +08:00
Danny Chan
7462fdefc3
[HUDI-2112] Support reading pure logs file group for flink batch reader after compaction ( #3202 )
2021-07-02 16:29:22 +08:00
pengzhiwei
b34d53fa9c
[HUDI-2088] Missing Partition Fields And PreCombineField In Hoodie Properties For Table Written By Flink ( #3171 )
2021-07-01 17:25:18 +08:00
yuzhaojing
07e93de8b4
[HUDI-2052] Support load logFile in BootstrapFunction ( #3134 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-30 20:37:00 +08:00
Danny Chan
b8a8f572d6
[HUDI-2094] Supports hive style partitioning for flink writer ( #3178 )
2021-06-29 15:34:26 +08:00
yuzhaojing
37b7c65d8a
[HUDI-2084] Resend the uncommitted write metadata when start up ( #3168 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-29 08:53:52 +08:00
Danny Chan
cdb9b48170
[HUDI-2040] Make flink writer as exactly-once by default ( #3106 )
2021-06-18 13:55:23 +08:00
Danny Chan
aa6342c3c9
[HUDI-2036] Move the compaction plan scheduling out of flink writer coordinator ( #3101 )
...
Since HUDI-1955 was fixed, we can move the scheduling out if the
coordinator to make the coordinator more lightweight.
2021-06-18 09:35:09 +08:00
yuzhaojing
f97dd25d41
[HUDI-2019] Set up the file system view storage config for singleton embedded server write config every time ( #3102 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-17 20:28:03 +08:00
Danny Chan
6763b45dd4
[HUDI-2030] Add metadata cache to WriteProfile to reduce IO ( #3090 )
...
Keeps same number of instant metadata cache and refresh the cache on new
commits.
2021-06-17 19:10:34 +08:00
Danny Chan
cb642ceb75
[HUDI-1999] Refresh the base file view cache for WriteProfile ( #3067 )
...
Refresh the view to discover new small files.
2021-06-15 08:18:38 -07:00
swuferhong
0c4f2fdc15
[HUDI-1984] Support independent flink hudi compaction function ( #3046 )
2021-06-13 15:04:46 +08:00
yuzhaojing
728089a888
delete duplicate bootstrap function ( #3052 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-09 19:29:57 +08:00
yuzhaojing
cf83f10f5b
add BootstrapFunction to support index bootstrap ( #3024 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-08 13:55:25 +08:00
Danny Chan
08464a6a5b
[HUDI-1931] BucketAssignFunction use ValueState instead of MapState ( #3026 )
...
Co-authored-by: 854194341@qq.com <loukey_7821>
2021-06-06 10:40:15 +08:00
Danny Chan
a658328001
[HUDI-1961] Add a debezium json integration test case for flink ( #3030 )
2021-06-04 15:15:32 +08:00
taylorliao
86007e9a13
[HUDI-1953] Fix NPE due to not set the output type of the operator ( #3023 )
...
Co-authored-by: enter58xuan <enter58xuan@zto.com >
2021-06-03 14:20:57 +08:00
Danny Chan
bf1cfb5635
[HUDI-1949] Refactor BucketAssigner to make it more efficient ( #3017 )
...
Add a process single class WriteProfile, the record and small files
profile re-construction can be more efficient if we reuse by same
checkpoint id.
2021-06-02 09:12:35 +08:00
yuzhaojing
bc18c39835
[FLINK-1923] Exactly-once write for flink writer ( #3002 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-05-28 14:58:21 +08:00
Town
aba1eadbfc
[HUDI-1919] Type mismatch when streaming read copy_on_write table using flink ( #2986 )
...
* [HUDI-1919] Type mismatch when streaming read copy_on_write table using flink #2976
* Update ParquetSplitReaderUtil.java
2021-05-25 11:36:43 +08:00
Danny Chan
9b01d2f864
[HUDI-1915] Fix the file id for write data buffer before flushing ( #2966 )
2021-05-20 10:20:08 +08:00
Danny Chan
7d2971d4e2
[HUDI-1911] Reuse the partition path and file group id for flink write data buffer ( #2961 )
...
Reuse to reduce memory footprint.
2021-05-18 17:47:22 +08:00
Danny Chan
46a2399a45
[HUDI-1902] Global index for flink writer ( #2958 )
...
Supports deduplication for record keys with different partition path.
2021-05-18 13:55:38 +08:00
Danny Chan
ad77cf42ba
[HUDI-1900] Always close the file handle for a flink mini-batch write ( #2943 )
...
Close the file handle eagerly to avoid corrupted files as much as
possible.
2021-05-14 10:25:18 +08:00
Danny Chan
b98c9ab439
[HUDI-1895] Close the file handles gracefully for flink write function to avoid corrupted files ( #2938 )
2021-05-12 18:44:10 +08:00
TeRS-K
be9db2c4f5
[HUDI-1055] Remove hardcoded parquet in tests ( #2740 )
...
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-05-11 10:01:45 -07:00
hiscat
7a5af806cf
[HUDI-1818] Validate required fields for Flink HoodieTable ( #2930 )
2021-05-11 11:11:19 +08:00
Danny Chan
c1b331bcff
[HUDI-1886] Avoid to generates corrupted files for flink sink ( #2929 )
2021-05-10 10:43:03 +08:00
Danny Chan
bfbf993cbe
[HUDI-1878] Add max memory option for flink writer task ( #2920 )
...
Also removes the rate limiter because it has the similar functionality,
modify the create and merge handle cleans the retry files automatically.
2021-05-08 14:27:56 +08:00
Danny Chan
528f4ca988
[HUDI-1880] Support streaming read with compaction and cleaning ( #2921 )
2021-05-07 20:04:35 +08:00
hiscat
0a5863939b
[HUDI-1821] Remove legacy code for Flink writer ( #2868 )
2021-05-07 10:58:49 +08:00
dijie
c5220b96e9
[HUDI-1781] Fix Flink streaming reader throws ClassCastException ( #2900 )
2021-05-01 19:13:15 +08:00
Danny Chan
6848a683bd
[HUDI-1867] Streaming read for Flink COW table ( #2895 )
...
Supports streaming read for Copy On Write table.
2021-04-29 20:44:45 +08:00
Danny Chan
6e9c5dd765
[HUDI-1863] Add rate limiter to Flink writer to avoid OOM for bootstrap ( #2891 )
2021-04-29 20:32:10 +08:00
Danny Chan
5be3997f70
[HUDI-1841] Tweak the min max commits to keep when setting up cleaning retain commits for Flink ( #2875 )
2021-04-27 10:58:06 +08:00
Danny Chan
1b27259b53
[HUDI-1844] Add option to flush when total buckets memory exceeds the threshold ( #2877 )
...
Current code supports flushing as per-bucket memory usage, while the
buckets may still take too much memory for bootstrap from history data.
When the threshold hits, flush out half of the buckets with bigger
buffer size.
2021-04-25 23:06:53 +08:00