1
0
Commit Graph

223 Commits

Author SHA1 Message Date
vinoth chandar
d07def1290 [MINOR] Fix broken build due to FlinkOptions (#3198) 2021-06-30 20:34:58 -07:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
yuzhaojing
07e93de8b4 [HUDI-2052] Support load logFile in BootstrapFunction (#3134)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-30 20:37:00 +08:00
yuzhaojing
1cbf43b6e7 [HUDI-2103] Add rebalance before index bootstrap (#3185)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-30 16:40:55 +08:00
wangxianghu
202887b8ca [HUDI-2092] Fix NPE caused by FlinkStreamerConfig#writePartitionUrlEncode null value (#3176) 2021-06-30 09:21:06 +08:00
swuferhong
f665db071f [HUDI-2085] Support specify compaction paralleism and compaction target io for flink batch compaction (#3169) 2021-06-29 22:53:01 +08:00
swuferhong
5a7d1b3d6c [HUDI-2097] Fix Flink unable to read commit metadata error (#3180) 2021-06-29 22:43:47 +08:00
Danny Chan
b8a8f572d6 [HUDI-2094] Supports hive style partitioning for flink writer (#3178) 2021-06-29 15:34:26 +08:00
yuzhaojing
37b7c65d8a [HUDI-2084] Resend the uncommitted write metadata when start up (#3168)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-29 08:53:52 +08:00
Vinay Patil
34fc8a8880 [HUDI-2067] Sync FlinkOptions config to FlinkStreamerConfig (#3151) 2021-06-28 19:26:08 +08:00
wangxianghu
9e61dad597 [MINOR] Drop duplicate keygenerator class configuration setting (#3167) 2021-06-28 17:11:32 +08:00
Danny Chan
d24341d10c [HUDI-2074] Use while loop instead of recursive call in MergeOnReadInputFormat#MergeIterator to avoid StackOverflow (#3159) 2021-06-28 16:03:10 +08:00
wangxianghu
f73bedd374 [MINOR] Remove unused methods (#3152) 2021-06-26 13:19:26 +08:00
Danny Chan
e64fe55054 [HUDI-2068] Skip the assign state for SmallFileAssign when the state can not assign initially (#3148) 2021-06-25 08:57:56 +08:00
yuzhaojing
218f2a6df8 [HUDI-2062] Catch FileNotFoundException in WriteProfiles #getCommitMetadata Safely (#3138)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-25 08:54:59 +08:00
yuzhaojing
380518e232 [HUDI-2038] Support rollback inflight compaction instances for CompactionPlanOperator (#3105)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-23 20:58:52 +08:00
Danny Chan
2687eab8f0 [HUDI-2054] Remove the duplicate name for flink write pipeline (#3135) 2021-06-23 14:49:38 +08:00
yuzhaojing
5db37c255b [HUDI-2047] Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant (#3125)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-22 14:18:46 +08:00
swuferhong
f8d9242372 [HUDI-2050] Support rollback inflight compaction instances for batch flink compactor (#3124) 2021-06-21 20:32:48 +08:00
Danny Chan
adf167991a [HUDI-2049] StreamWriteFunction should wait for the next inflight instant time before flushing (#3123) 2021-06-21 20:15:27 +08:00
Danny Chan
cdb9b48170 [HUDI-2040] Make flink writer as exactly-once by default (#3106) 2021-06-18 13:55:23 +08:00
Danny Chan
aa6342c3c9 [HUDI-2036] Move the compaction plan scheduling out of flink writer coordinator (#3101)
Since HUDI-1955 was fixed, we can move the scheduling out if the
coordinator to make the coordinator more lightweight.
2021-06-18 09:35:09 +08:00
vinoyang
67c3124352 [HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig (#3104)
* [HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig

* Address the review suggestion
2021-06-17 21:22:13 +08:00
yuzhaojing
f97dd25d41 [HUDI-2019] Set up the file system view storage config for singleton embedded server write config every time (#3102)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-17 20:28:03 +08:00
Danny Chan
6763b45dd4 [HUDI-2030] Add metadata cache to WriteProfile to reduce IO (#3090)
Keeps same number of instant metadata cache and refresh the cache on new
commits.
2021-06-17 19:10:34 +08:00
Danny Chan
0b57483a8e [HUDI-2015] Fix flink operator uid to allow multiple pipelines in one job (#3091) 2021-06-17 09:08:19 +08:00
swuferhong
8b0a502c4f [HUDI-2014] Support flink hive sync in batch mode (#3081) 2021-06-16 14:29:16 +08:00
Danny Chan
cb642ceb75 [HUDI-1999] Refresh the base file view cache for WriteProfile (#3067)
Refresh the view to discover new small files.
2021-06-15 08:18:38 -07:00
swuferhong
0c4f2fdc15 [HUDI-1984] Support independent flink hudi compaction function (#3046) 2021-06-13 15:04:46 +08:00
Danny Chan
125415a8b8 [HUDI-1994] Release the new records iterator for append handle #close (#3058) 2021-06-10 19:09:23 +08:00
yuzhaojing
728089a888 delete duplicate bootstrap function (#3052)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-09 19:29:57 +08:00
Danny Chan
e8fcf04b57 [HUDI-1987] Fix non partition table hive meta sync for flink writer (#3049) 2021-06-09 14:20:04 +08:00
wangxianghu
7261f08507 [HUDI-1929] Support configure KeyGenerator by type (#2993) 2021-06-08 09:26:10 -04:00
yuzhaojing
cf83f10f5b add BootstrapFunction to support index bootstrap (#3024)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-08 13:55:25 +08:00
Vinay Patil
0d0dc6fb07 [HUDI-1909] Skip Commits with empty files (#3045) 2021-06-07 21:58:19 +08:00
Danny Chan
08464a6a5b [HUDI-1931] BucketAssignFunction use ValueState instead of MapState (#3026)
Co-authored-by: 854194341@qq.com <loukey_7821>
2021-06-06 10:40:15 +08:00
yuzhaojing
c4a2ad2702 [HUDI-1954] only reset bucket when flush bucket success (#3029)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-04 20:48:08 -07:00
Danny Chan
a658328001 [HUDI-1961] Add a debezium json integration test case for flink (#3030) 2021-06-04 15:15:32 +08:00
taylorliao
86007e9a13 [HUDI-1953] Fix NPE due to not set the output type of the operator (#3023)
Co-authored-by: enter58xuan <enter58xuan@zto.com>
2021-06-03 14:20:57 +08:00
Danny Chan
7fa2f8ea82 [HUDI-1921] Add target io option for flink compaction (#2980) 2021-06-02 10:10:35 +08:00
Danny Chan
bf1cfb5635 [HUDI-1949] Refactor BucketAssigner to make it more efficient (#3017)
Add a process single class WriteProfile, the record and small files
profile re-construction can be more efficient if we reuse by same
checkpoint id.
2021-06-02 09:12:35 +08:00
taylorliao
83c31e356f [HUDI-1927] Improve HoodieFlinkStreamer (#3019)
Co-authored-by: enter58xuan <enter58xuan@zto.com>
2021-06-01 18:35:14 +08:00
yuzhaojing
bc18c39835 [FLINK-1923] Exactly-once write for flink writer (#3002)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-05-28 14:58:21 +08:00
Danny Chan
7fed7352bd [HUDI-1865] Make embedded time line service singleton (#2899) 2021-05-27 13:38:33 +08:00
wangxianghu
e7020748b5 [HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978) 2021-05-25 16:29:55 +08:00
Town
aba1eadbfc [HUDI-1919] Type mismatch when streaming read copy_on_write table using flink (#2986)
* [HUDI-1919] Type mismatch when streaming read copy_on_write table using flink #2976

* Update ParquetSplitReaderUtil.java
2021-05-25 11:36:43 +08:00
zhangminglei
99b14a78e3 [HUDI-1918] Fix incorrect keyBy field cause serious data skew, to avoid multiple subtasks write to a partition at the same time (#2972) 2021-05-21 21:59:47 +08:00
swuferhong
928b09ea0b [HUDI-1871] Fix hive conf for Flink writer hive meta sync (#2968) 2021-05-20 17:03:52 +08:00
Danny Chan
9b01d2f864 [HUDI-1915] Fix the file id for write data buffer before flushing (#2966) 2021-05-20 10:20:08 +08:00
Danny Chan
7d2971d4e2 [HUDI-1911] Reuse the partition path and file group id for flink write data buffer (#2961)
Reuse to reduce memory footprint.
2021-05-18 17:47:22 +08:00