Danny Chan
abf3e3fe71
[HUDI-2548] Flink streaming reader misses the rolling over file handles ( #3787 )
2021-10-14 10:36:18 +08:00
Sivabalan Narayanan
cff384d23f
[HUDI-2552] Fixing some test failures to unblock broken CI master ( #3793 )
2021-10-13 18:44:43 -04:00
Danny Chan
ad63938890
[HUDI-2537] Fix metadata table for flink ( #3774 )
2021-10-10 09:30:39 +08:00
Sivabalan Narayanan
5f32162a2f
[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 ( #3590 )
...
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.
- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer.
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table
Co-authored-by: Prashant Wason <pwason@uber.com >
2021-10-06 00:17:52 -04:00
Carl-Zhou-CN
aa546554ff
[HUDI-2451] On windows client with hdfs server for wrong file separator ( #3687 )
...
Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com >
2021-09-26 21:51:27 +08:00
Danny Chan
31a301f0aa
[HUDI-2485] Consume as mini-batch for flink stream reader ( #3710 )
2021-09-24 23:44:01 +08:00
Danny Chan
440525ccbb
[HUDI-2483] Infer changelog mode for flink compactor ( #3706 )
2021-09-24 14:52:27 +08:00
Danny Chan
3354fac42f
[HUDI-2449] Incremental read for Flink ( #3686 )
2021-09-19 09:06:46 +08:00
Danny Chan
627f20f9c5
[HUDI-2430] Make decimal compatible with hudi for flink writer ( #3658 )
2021-09-15 12:04:46 +08:00
Danny Chan
b30c5bdaef
[HUDI-2412] Add timestamp based partitioning for flink writer ( #3638 )
2021-09-11 13:17:16 +08:00
Danny Chan
db2ab9a150
[HUDI-2403] Add metadata table listing for flink query source ( #3618 )
2021-09-08 14:52:39 +08:00
yuzhaojing
7a1bd225ca
[HUDI-2376] Add pipeline for Append mode ( #3573 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-09-02 16:32:40 +08:00
Danny Chan
f66e1ce9bf
[HUDI-2379] Include the pending compaction file groups for flink ( #3567 )
...
streaming reader
2021-09-01 16:47:52 +08:00
Danny Chan
57668d02a0
[HUDI-2371] Improvement flink streaming reader ( #3552 )
...
- Support reading empty table
- Fix filtering by partition path
- Support reading from earliest commit
2021-08-28 20:16:54 +08:00
mikewu
9850e90e2e
[HUDI-2229] Refact HoodieFlinkStreamer to reuse the pipeline of HoodieTableSink ( #3495 )
...
Co-authored-by: mikewu <xingbo.wxb@alibaba-inc.com >
2021-08-27 10:14:04 +08:00
yuzhaojing
ab3fbb8895
[HUDI-2342] Optimize Bootstrap operator ( #3516 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-08-21 20:03:03 +08:00
Danny Chan
c7c517f14c
[HUDI-2340] Merge the data set for flink bounded source when changelog mode turns off ( #3513 )
2021-08-21 07:21:35 +08:00
Udit Mehrotra
c350d05dd3
Restore 0.8.0 config keys with deprecated annotation ( #3506 )
...
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-08-19 13:36:40 -07:00
Danny Chan
9762e4c08c
[MINOR] Some cosmetic changes for Flink ( #3503 )
2021-08-19 23:21:20 +08:00
swuferhong
1fed44af84
[HUDI-2316] Support Flink batch upsert ( #3494 )
2021-08-19 17:15:26 +08:00
Danny Chan
66f951322a
[HUDI-2191] Bump flink version to 1.13.1 ( #3291 )
2021-08-16 18:14:05 +08:00
Danny Chan
29332498af
[HUDI-2298] The HoodieMergedLogRecordScanner should set up the operation of the chosen record ( #3456 )
2021-08-11 22:55:43 +08:00
swuferhong
21db6d7a84
[HUDI-1771] Propagate CDC format for hoodie ( #3285 )
2021-08-10 20:23:23 +08:00
Danny Chan
b7586a5632
[HUDI-2274] Allows INSERT duplicates for Flink MOR table ( #3403 )
2021-08-06 10:30:52 +08:00
yuzhaojing
b8b9d6db83
[HUDI-2087] Support Append only in Flink stream ( #3390 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-08-04 17:53:20 +08:00
Danny Chan
02331fc223
[HUDI-2258] Metadata table for flink ( #3381 )
2021-08-04 10:54:55 +08:00
swuferhong
f7f5d4cc6d
[HUDI-2184] Support setting hive sync partition extractor class based on flink configuration ( #3284 )
2021-07-30 17:24:00 +08:00
Danny Chan
c4e45a0010
[HUDI-2254] Builtin sort operator for flink bulk insert ( #3372 )
2021-07-30 16:58:11 +08:00
swuferhong
8b19ec9ca0
[HUDI-2252] Default consumes from the latest instant for flink streaming reader ( #3368 )
2021-07-30 14:25:05 +08:00
Danny Chan
91c2213412
[HUDI-2245] BucketAssigner generates the fileId evenly to avoid data skew ( #3362 )
2021-07-28 19:26:37 +08:00
rmahindra123
8fef50e237
[HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map ( #3318 )
2021-07-28 01:31:03 -04:00
Danny Chan
9d2a65a6a6
[HUDI-2209] Bulk insert for flink writer ( #3334 )
2021-07-27 10:58:23 +08:00
pengzhiwei
2c910ee3af
[HUDI-2212] Missing PrimaryKey In Hoodie Properties For CTAS Table ( #3332 )
2021-07-23 15:21:57 +08:00
Danny Chan
c89bf1de20
[HUDI-2205] Rollback inflight compaction for flink writer ( #3320 )
2021-07-22 22:56:51 +08:00
Danny Chan
858e84b5b2
[HUDI-2198] Clean and reset the bootstrap events for coordinator when task failover ( #3304 )
2021-07-21 10:13:05 +08:00
yuzhaojing
634163a990
[HUDI-2145] Create new bucket when NewFileAssignState filled ( #3258 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-20 17:46:45 +08:00
喻兆靖
2099bf41db
[HUDI-2193] Remove state in BootstrapFunction
2021-07-19 18:14:06 +08:00
yuzhao.cyz
50c2b76d72
Revert "[HUDI-2087] Support Append only in Flink stream ( #3252 )"
...
This reverts commit 783c9cb3
2021-07-16 21:36:27 +08:00
yuzhao.cyz
c8aaf00819
[HUDI-2185] Remove the default parallelism of index bootstrap and bucket assigner
2021-07-16 15:44:15 +08:00
vinoyang
52524b659d
[HUDI-2165] Support Transformer for HoodieFlinkStreamer ( #3270 )
...
* [HUDI-2165] Support Transformer for HoodieFlinkStreamer
2021-07-14 23:01:52 +08:00
yuzhaojing
783c9cb369
[HUDI-2087] Support Append only in Flink stream ( #3252 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-10 14:49:35 +08:00
vinoyang
7c6eebf98c
[MINOR] Fix some wrong assert reasons ( #3248 )
2021-07-10 14:35:40 +08:00
vinoth chandar
b4562e86e4
Revert "[HUDI-2087] Support Append only in Flink stream ( #3174 )" ( #3251 )
...
This reverts commit 371526789d .
2021-07-09 11:20:09 -07:00
yuzhaojing
371526789d
[HUDI-2087] Support Append only in Flink stream ( #3174 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-09 16:06:32 +08:00
wangxianghu
f2621da32f
[HUDI-2093] Fix empty avro schema path caused by duplicate parameters ( #3177 )
...
* [HUDI-2093] Fix empty avro schema path caused by duplicate parameters
* rename shcmea option key
* fix doc
* rename var name
2021-07-06 15:14:30 +08:00
Danny Chan
32bd8ce088
[HUDI-2132] Make coordinator events as POJO for efficient serialization ( #3223 )
2021-07-06 09:02:38 +08:00
Danny Chan
e6ee7bdb51
[HUDI-2129] StreamerUtil.medianInstantTime should return a valid date time string ( #3221 )
2021-07-05 20:56:24 +08:00
Danny Chan
7462fdefc3
[HUDI-2112] Support reading pure logs file group for flink batch reader after compaction ( #3202 )
2021-07-02 16:29:22 +08:00
pengzhiwei
b34d53fa9c
[HUDI-2088] Missing Partition Fields And PreCombineField In Hoodie Properties For Table Written By Flink ( #3171 )
2021-07-01 17:25:18 +08:00
yuzhaojing
07e93de8b4
[HUDI-2052] Support load logFile in BootstrapFunction ( #3134 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-06-30 20:37:00 +08:00