Danny Chan
6763b45dd4
[HUDI-2030] Add metadata cache to WriteProfile to reduce IO ( #3090 )
...
Keeps same number of instant metadata cache and refresh the cache on new
commits.
2021-06-17 19:10:34 +08:00
Danny Chan
0b57483a8e
[HUDI-2015] Fix flink operator uid to allow multiple pipelines in one job ( #3091 )
2021-06-17 09:08:19 +08:00
swuferhong
8b0a502c4f
[HUDI-2014] Support flink hive sync in batch mode ( #3081 )
2021-06-16 14:29:16 +08:00
Danny Chan
cb642ceb75
[HUDI-1999] Refresh the base file view cache for WriteProfile ( #3067 )
...
Refresh the view to discover new small files.
2021-06-15 08:18:38 -07:00
swuferhong
0c4f2fdc15
[HUDI-1984] Support independent flink hudi compaction function ( #3046 )
2021-06-13 15:04:46 +08:00
Danny Chan
125415a8b8
[HUDI-1994] Release the new records iterator for append handle #close ( #3058 )
2021-06-10 19:09:23 +08:00
yuzhaojing
728089a888
delete duplicate bootstrap function ( #3052 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-09 19:29:57 +08:00
Danny Chan
e8fcf04b57
[HUDI-1987] Fix non partition table hive meta sync for flink writer ( #3049 )
2021-06-09 14:20:04 +08:00
wangxianghu
7261f08507
[HUDI-1929] Support configure KeyGenerator by type ( #2993 )
2021-06-08 09:26:10 -04:00
yuzhaojing
cf83f10f5b
add BootstrapFunction to support index bootstrap ( #3024 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-08 13:55:25 +08:00
Vinay Patil
0d0dc6fb07
[HUDI-1909] Skip Commits with empty files ( #3045 )
2021-06-07 21:58:19 +08:00
Danny Chan
08464a6a5b
[HUDI-1931] BucketAssignFunction use ValueState instead of MapState ( #3026 )
...
Co-authored-by: 854194341@qq.com <loukey_7821>
2021-06-06 10:40:15 +08:00
yuzhaojing
c4a2ad2702
[HUDI-1954] only reset bucket when flush bucket success ( #3029 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-04 20:48:08 -07:00
Danny Chan
a658328001
[HUDI-1961] Add a debezium json integration test case for flink ( #3030 )
2021-06-04 15:15:32 +08:00
taylorliao
86007e9a13
[HUDI-1953] Fix NPE due to not set the output type of the operator ( #3023 )
...
Co-authored-by: enter58xuan <enter58xuan@zto.com >
2021-06-03 14:20:57 +08:00
Danny Chan
7fa2f8ea82
[HUDI-1921] Add target io option for flink compaction ( #2980 )
2021-06-02 10:10:35 +08:00
Danny Chan
bf1cfb5635
[HUDI-1949] Refactor BucketAssigner to make it more efficient ( #3017 )
...
Add a process single class WriteProfile, the record and small files
profile re-construction can be more efficient if we reuse by same
checkpoint id.
2021-06-02 09:12:35 +08:00
taylorliao
83c31e356f
[HUDI-1927] Improve HoodieFlinkStreamer ( #3019 )
...
Co-authored-by: enter58xuan <enter58xuan@zto.com >
2021-06-01 18:35:14 +08:00
yuzhaojing
bc18c39835
[FLINK-1923] Exactly-once write for flink writer ( #3002 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-05-28 14:58:21 +08:00
Danny Chan
7fed7352bd
[HUDI-1865] Make embedded time line service singleton ( #2899 )
2021-05-27 13:38:33 +08:00
wangxianghu
e7020748b5
[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME ( #2978 )
2021-05-25 16:29:55 +08:00
Town
aba1eadbfc
[HUDI-1919] Type mismatch when streaming read copy_on_write table using flink ( #2986 )
...
* [HUDI-1919] Type mismatch when streaming read copy_on_write table using flink #2976
* Update ParquetSplitReaderUtil.java
2021-05-25 11:36:43 +08:00
zhangminglei
99b14a78e3
[HUDI-1918] Fix incorrect keyBy field cause serious data skew, to avoid multiple subtasks write to a partition at the same time ( #2972 )
2021-05-21 21:59:47 +08:00
swuferhong
928b09ea0b
[HUDI-1871] Fix hive conf for Flink writer hive meta sync ( #2968 )
2021-05-20 17:03:52 +08:00
Danny Chan
9b01d2f864
[HUDI-1915] Fix the file id for write data buffer before flushing ( #2966 )
2021-05-20 10:20:08 +08:00
Danny Chan
7d2971d4e2
[HUDI-1911] Reuse the partition path and file group id for flink write data buffer ( #2961 )
...
Reuse to reduce memory footprint.
2021-05-18 17:47:22 +08:00
Danny Chan
46a2399a45
[HUDI-1902] Global index for flink writer ( #2958 )
...
Supports deduplication for record keys with different partition path.
2021-05-18 13:55:38 +08:00
Danny Chan
8869b3b418
[HUDI-1902] Clean the corrupted files generated by FlinkMergeAndReplaceHandle ( #2949 )
...
Make the intermediate files of FlinkMergeAndReplaceHandle hidden, when
committing the instant, clean these files in case there was some
corrupted files left(in normal case, the intermediate files should be cleaned
by the FlinkMergeAndReplaceHandle itself).
2021-05-14 15:43:37 +08:00
Danny Chan
ad77cf42ba
[HUDI-1900] Always close the file handle for a flink mini-batch write ( #2943 )
...
Close the file handle eagerly to avoid corrupted files as much as
possible.
2021-05-14 10:25:18 +08:00
Danny Chan
b98c9ab439
[HUDI-1895] Close the file handles gracefully for flink write function to avoid corrupted files ( #2938 )
2021-05-12 18:44:10 +08:00
TeRS-K
be9db2c4f5
[HUDI-1055] Remove hardcoded parquet in tests ( #2740 )
...
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-05-11 10:01:45 -07:00
hiscat
7a5af806cf
[HUDI-1818] Validate required fields for Flink HoodieTable ( #2930 )
2021-05-11 11:11:19 +08:00
hiscat
511ac4881d
[MINOR] optimize FilePathUtils ( #2931 )
2021-05-10 06:47:56 -07:00
Danny Chan
c1b331bcff
[HUDI-1886] Avoid to generates corrupted files for flink sink ( #2929 )
2021-05-10 10:43:03 +08:00
Danny Chan
bfbf993cbe
[HUDI-1878] Add max memory option for flink writer task ( #2920 )
...
Also removes the rate limiter because it has the similar functionality,
modify the create and merge handle cleans the retry files automatically.
2021-05-08 14:27:56 +08:00
Danny Chan
528f4ca988
[HUDI-1880] Support streaming read with compaction and cleaning ( #2921 )
2021-05-07 20:04:35 +08:00
hiscat
0a5863939b
[HUDI-1821] Remove legacy code for Flink writer ( #2868 )
2021-05-07 10:58:49 +08:00
dijie
c5220b96e9
[HUDI-1781] Fix Flink streaming reader throws ClassCastException ( #2900 )
2021-05-01 19:13:15 +08:00
Danny Chan
6848a683bd
[HUDI-1867] Streaming read for Flink COW table ( #2895 )
...
Supports streaming read for Copy On Write table.
2021-04-29 20:44:45 +08:00
Danny Chan
6e9c5dd765
[HUDI-1863] Add rate limiter to Flink writer to avoid OOM for bootstrap ( #2891 )
2021-04-29 20:32:10 +08:00
hiscat
63fa2b6186
[HUDI-1836] Logging consuming instant to StreamReadOperator#processSplits ( #2867 )
2021-04-27 14:00:59 +08:00
Danny Chan
5be3997f70
[HUDI-1841] Tweak the min max commits to keep when setting up cleaning retain commits for Flink ( #2875 )
2021-04-27 10:58:06 +08:00
Danny Chan
d047e91d86
[HUDI-1837] Add optional instant range to log record scanner for log ( #2870 )
2021-04-26 16:53:18 +08:00
Danny Chan
1b27259b53
[HUDI-1844] Add option to flush when total buckets memory exceeds the threshold ( #2877 )
...
Current code supports flushing as per-bucket memory usage, while the
buckets may still take too much memory for bootstrap from history data.
When the threshold hits, flush out half of the buckets with bigger
buffer size.
2021-04-25 23:06:53 +08:00
Danny Chan
a5789c4067
[HUDI-1829] Use while loop instead of recursive call in MergeOnReadInputFormat to avoid StackOverflow ( #2862 )
...
Recursive all is risky for StackOverflow when there are too many.
2021-04-23 09:59:36 +08:00
hiscat
cc81ddde01
[HUDI-1812] Add explicit index state TTL option for Flink writer ( #2853 )
2021-04-21 20:13:30 +08:00
Danny Chan
ac3589f006
[HUDI-1814] Non partitioned table for Flink writer ( #2859 )
2021-04-21 20:07:27 +08:00
Danny Chan
d6d52c6063
[HUDI-1809] Flink merge on read input split uses wrong base file path for default merge type ( #2846 )
2021-04-20 21:27:09 +08:00
hj2016
62b8a341dd
[HUDI-1792] flink-client query error when processing files larger than 128mb ( #2814 )
...
Co-authored-by: huangjing <huangjing@clinbrain.com >
2021-04-16 13:59:19 +08:00
Danny Chan
b6d949b48a
[HUDI-1801] FlinkMergeHandle rolling over may miss to rename the latest file handle ( #2831 )
...
The FlinkMergeHandle may rename the N-1 th file handle instead of the
latest one, thus to cause data duplication.
2021-04-16 11:40:53 +08:00