wangxianghu
9e61dad597
[MINOR] Drop duplicate keygenerator class configuration setting ( #3167 )
2021-06-28 17:11:32 +08:00
Danny Chan
d24341d10c
[HUDI-2074] Use while loop instead of recursive call in MergeOnReadInputFormat#MergeIterator to avoid StackOverflow ( #3159 )
2021-06-28 16:03:10 +08:00
wangxianghu
f73bedd374
[MINOR] Remove unused methods ( #3152 )
2021-06-26 13:19:26 +08:00
Danny Chan
e64fe55054
[HUDI-2068] Skip the assign state for SmallFileAssign when the state can not assign initially ( #3148 )
2021-06-25 08:57:56 +08:00
yuzhaojing
218f2a6df8
[HUDI-2062] Catch FileNotFoundException in WriteProfiles #getCommitMetadata Safely ( #3138 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-25 08:54:59 +08:00
yuzhaojing
380518e232
[HUDI-2038] Support rollback inflight compaction instances for CompactionPlanOperator ( #3105 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-23 20:58:52 +08:00
Danny Chan
2687eab8f0
[HUDI-2054] Remove the duplicate name for flink write pipeline ( #3135 )
2021-06-23 14:49:38 +08:00
yuzhaojing
5db37c255b
[HUDI-2047] Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant ( #3125 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-22 14:18:46 +08:00
swuferhong
f8d9242372
[HUDI-2050] Support rollback inflight compaction instances for batch flink compactor ( #3124 )
2021-06-21 20:32:48 +08:00
Danny Chan
adf167991a
[HUDI-2049] StreamWriteFunction should wait for the next inflight instant time before flushing ( #3123 )
2021-06-21 20:15:27 +08:00
Danny Chan
cdb9b48170
[HUDI-2040] Make flink writer as exactly-once by default ( #3106 )
2021-06-18 13:55:23 +08:00
Danny Chan
aa6342c3c9
[HUDI-2036] Move the compaction plan scheduling out of flink writer coordinator ( #3101 )
...
Since HUDI-1955 was fixed, we can move the scheduling out if the
coordinator to make the coordinator more lightweight.
2021-06-18 09:35:09 +08:00
vinoyang
67c3124352
[HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig ( #3104 )
...
* [HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig
* Address the review suggestion
2021-06-17 21:22:13 +08:00
yuzhaojing
f97dd25d41
[HUDI-2019] Set up the file system view storage config for singleton embedded server write config every time ( #3102 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-17 20:28:03 +08:00
Danny Chan
6763b45dd4
[HUDI-2030] Add metadata cache to WriteProfile to reduce IO ( #3090 )
...
Keeps same number of instant metadata cache and refresh the cache on new
commits.
2021-06-17 19:10:34 +08:00
Danny Chan
0b57483a8e
[HUDI-2015] Fix flink operator uid to allow multiple pipelines in one job ( #3091 )
2021-06-17 09:08:19 +08:00
swuferhong
8b0a502c4f
[HUDI-2014] Support flink hive sync in batch mode ( #3081 )
2021-06-16 14:29:16 +08:00
Danny Chan
cb642ceb75
[HUDI-1999] Refresh the base file view cache for WriteProfile ( #3067 )
...
Refresh the view to discover new small files.
2021-06-15 08:18:38 -07:00
swuferhong
0c4f2fdc15
[HUDI-1984] Support independent flink hudi compaction function ( #3046 )
2021-06-13 15:04:46 +08:00
Danny Chan
125415a8b8
[HUDI-1994] Release the new records iterator for append handle #close ( #3058 )
2021-06-10 19:09:23 +08:00
yuzhaojing
728089a888
delete duplicate bootstrap function ( #3052 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-09 19:29:57 +08:00
Danny Chan
e8fcf04b57
[HUDI-1987] Fix non partition table hive meta sync for flink writer ( #3049 )
2021-06-09 14:20:04 +08:00
wangxianghu
7261f08507
[HUDI-1929] Support configure KeyGenerator by type ( #2993 )
2021-06-08 09:26:10 -04:00
yuzhaojing
cf83f10f5b
add BootstrapFunction to support index bootstrap ( #3024 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-08 13:55:25 +08:00
Vinay Patil
0d0dc6fb07
[HUDI-1909] Skip Commits with empty files ( #3045 )
2021-06-07 21:58:19 +08:00
Danny Chan
08464a6a5b
[HUDI-1931] BucketAssignFunction use ValueState instead of MapState ( #3026 )
...
Co-authored-by: 854194341@qq.com <loukey_7821>
2021-06-06 10:40:15 +08:00
yuzhaojing
c4a2ad2702
[HUDI-1954] only reset bucket when flush bucket success ( #3029 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-04 20:48:08 -07:00
Danny Chan
a658328001
[HUDI-1961] Add a debezium json integration test case for flink ( #3030 )
2021-06-04 15:15:32 +08:00
taylorliao
86007e9a13
[HUDI-1953] Fix NPE due to not set the output type of the operator ( #3023 )
...
Co-authored-by: enter58xuan <enter58xuan@zto.com >
2021-06-03 14:20:57 +08:00
Danny Chan
7fa2f8ea82
[HUDI-1921] Add target io option for flink compaction ( #2980 )
2021-06-02 10:10:35 +08:00
Danny Chan
bf1cfb5635
[HUDI-1949] Refactor BucketAssigner to make it more efficient ( #3017 )
...
Add a process single class WriteProfile, the record and small files
profile re-construction can be more efficient if we reuse by same
checkpoint id.
2021-06-02 09:12:35 +08:00
taylorliao
83c31e356f
[HUDI-1927] Improve HoodieFlinkStreamer ( #3019 )
...
Co-authored-by: enter58xuan <enter58xuan@zto.com >
2021-06-01 18:35:14 +08:00
yuzhaojing
bc18c39835
[FLINK-1923] Exactly-once write for flink writer ( #3002 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-05-28 14:58:21 +08:00
Danny Chan
7fed7352bd
[HUDI-1865] Make embedded time line service singleton ( #2899 )
2021-05-27 13:38:33 +08:00
wangxianghu
e7020748b5
[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME ( #2978 )
2021-05-25 16:29:55 +08:00
Town
aba1eadbfc
[HUDI-1919] Type mismatch when streaming read copy_on_write table using flink ( #2986 )
...
* [HUDI-1919] Type mismatch when streaming read copy_on_write table using flink #2976
* Update ParquetSplitReaderUtil.java
2021-05-25 11:36:43 +08:00
zhangminglei
99b14a78e3
[HUDI-1918] Fix incorrect keyBy field cause serious data skew, to avoid multiple subtasks write to a partition at the same time ( #2972 )
2021-05-21 21:59:47 +08:00
swuferhong
928b09ea0b
[HUDI-1871] Fix hive conf for Flink writer hive meta sync ( #2968 )
2021-05-20 17:03:52 +08:00
Danny Chan
9b01d2f864
[HUDI-1915] Fix the file id for write data buffer before flushing ( #2966 )
2021-05-20 10:20:08 +08:00
Danny Chan
7d2971d4e2
[HUDI-1911] Reuse the partition path and file group id for flink write data buffer ( #2961 )
...
Reuse to reduce memory footprint.
2021-05-18 17:47:22 +08:00
Danny Chan
46a2399a45
[HUDI-1902] Global index for flink writer ( #2958 )
...
Supports deduplication for record keys with different partition path.
2021-05-18 13:55:38 +08:00
Danny Chan
8869b3b418
[HUDI-1902] Clean the corrupted files generated by FlinkMergeAndReplaceHandle ( #2949 )
...
Make the intermediate files of FlinkMergeAndReplaceHandle hidden, when
committing the instant, clean these files in case there was some
corrupted files left(in normal case, the intermediate files should be cleaned
by the FlinkMergeAndReplaceHandle itself).
2021-05-14 15:43:37 +08:00
Danny Chan
ad77cf42ba
[HUDI-1900] Always close the file handle for a flink mini-batch write ( #2943 )
...
Close the file handle eagerly to avoid corrupted files as much as
possible.
2021-05-14 10:25:18 +08:00
Danny Chan
b98c9ab439
[HUDI-1895] Close the file handles gracefully for flink write function to avoid corrupted files ( #2938 )
2021-05-12 18:44:10 +08:00
TeRS-K
be9db2c4f5
[HUDI-1055] Remove hardcoded parquet in tests ( #2740 )
...
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-05-11 10:01:45 -07:00
hiscat
7a5af806cf
[HUDI-1818] Validate required fields for Flink HoodieTable ( #2930 )
2021-05-11 11:11:19 +08:00
hiscat
511ac4881d
[MINOR] optimize FilePathUtils ( #2931 )
2021-05-10 06:47:56 -07:00
Danny Chan
c1b331bcff
[HUDI-1886] Avoid to generates corrupted files for flink sink ( #2929 )
2021-05-10 10:43:03 +08:00
Danny Chan
bfbf993cbe
[HUDI-1878] Add max memory option for flink writer task ( #2920 )
...
Also removes the rate limiter because it has the similar functionality,
modify the create and merge handle cleans the retry files automatically.
2021-05-08 14:27:56 +08:00
Danny Chan
528f4ca988
[HUDI-1880] Support streaming read with compaction and cleaning ( #2921 )
2021-05-07 20:04:35 +08:00