1
0
Commit Graph

148 Commits

Author SHA1 Message Date
Town
4b0111974f [HUDI-3184] hudi-flink support timestamp-micros (#4548)
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Sagar Sumit
827549949c [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
fengli
205e48f53f [HUDI-3132] Minor fixes for HoodieCatalog
close apache/hudi#4486
2022-01-06 11:17:23 +08:00
Ron
674c149234 [HUDI-3083] Support component data types for flink bulk_insert (#4470)
* [HUDI-3083] Support component data types for flink bulk_insert

* add nested row type test
2021-12-30 11:15:54 +08:00
yuzhaojing
15eb7e81fc [HUDI-2547] Schedule Flink compaction in service (#4254)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-12-22 15:08:47 +08:00
Danny Chan
d0087d4040 [HUDI-3037] Add back remote view storage config for flink (#4338) 2021-12-17 13:57:53 +08:00
Fugle666
29bc5fd912 [HUDI-2996] Flink streaming reader 'skip_compaction' option does not work (#4304)
close apache/hudi#4304
2021-12-14 12:21:09 +08:00
Danny Chan
e8473b9a2b [HUDI-2951] Disable remote view storage config for flink (#4237) 2021-12-07 18:04:15 +08:00
Ron
a8fb69656f [HUDI-2877] Support flink catalog to help user use flink table conveniently (#4153)
* [HUDI-2877] Support flink catalog to help user use flink table conveniently

* Fix comment

* fix comment2
2021-12-05 10:14:29 +08:00
Danny Chan
934fe54cc5 [HUDI-2914] Fix remote timeline server config for flink (#4191) 2021-12-03 08:59:10 +08:00
Danny Chan
a2eb2b0b0a [HUDI-2480] FileSlice after pending compaction-requested instant-time… (#3703)
* [HUDI-2480] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader

* include file slice after a pending compaction for spark reader

Co-authored-by: garyli1019 <yanjia.gary.li@gmail.com>
2021-11-25 22:30:09 +08:00
Danny Chan
0bb506fa00 [HUDI-2847] Flink metadata table supports virtual keys (#4096) 2021-11-24 17:34:42 +08:00
Sivabalan Narayanan
fc9ca6a07a [HUDI-2559] Converting commit timestamp format to millisecs (#4024)
- Adds support for generating commit timestamps with millisecs granularity. 
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
2021-11-22 11:44:38 -05:00
Danny Chan
520538b15d [HUDI-2392] Make flink parquet reader compatible with decimal BINARY encoding (#4057) 2021-11-21 13:27:18 +08:00
Danny Chan
0411f73c7d [HUDI-2804] Add option to skip compaction instants for streaming read (#4051) 2021-11-21 12:38:56 +08:00
Danny Chan
bf008762df [HUDI-2798] Fix flink query operation fields (#4041) 2021-11-19 23:39:37 +08:00
Danny Chan
71a2ae0fd6 [HUDI-2789] Flink batch upsert for non partitioned table does not work (#4028) 2021-11-18 13:59:03 +08:00
Danny Chan
6f5e661010 [HUDI-2769] Fix StreamerUtil#medianInstantTime for very near instant time (#4005) 2021-11-16 13:46:34 +08:00
Danny Chan
bc511edc85 [HUDI-2746] Do not bootstrap for flink insert overwrite (#3980) 2021-11-12 12:17:58 +08:00
yuzhaojing
6b93ccca9b [HUDI-2738] Remove the bucketAssignFunction useless context (#3972)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-11-11 21:03:01 +08:00
yuzhaojing
90f9b4562a [HUDI-2685] Support scheduling online compaction plan when there are no commit data (#3928)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2021-11-11 10:13:21 +08:00
Prashant Wason
b7ee341e14 [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe. (#2819) 2021-11-05 09:31:42 -04:00
Danny Chan
3af6568d31 [HUDI-2696] Remove the aborted checkpoint notification from coordinator (#3926) 2021-11-05 16:37:23 +08:00
Danny Chan
33436aa359 Revert "[HUDI-2677] Add DFS based message queue for flink writer (#3915)" (#3923)
This reverts commit dbf8c44bdb.
2021-11-04 20:48:57 +08:00
Danny Chan
dbf8c44bdb [HUDI-2677] Add DFS based message queue for flink writer (#3915) 2021-11-04 18:09:00 +08:00
Danny Chan
689020f303 [HUDI-2684] Use DefaultHoodieRecordPayload when precombine field is specified specifically (#3922) 2021-11-04 16:23:36 +08:00
Danny Chan
87c6f9cd07 [HUDI-2654] Add compaction failed event(part2) (#3896) 2021-10-31 17:51:11 +08:00
Danny Chan
92a3c458bd [HUDI-2654] Schedules the compaction from earliest for flink (#3891) 2021-10-30 08:37:30 +08:00
Danny Chan
e5b6b8602c [HUDI-2633] Make precombine field optional for flink (#3874) 2021-10-28 13:52:06 +08:00
Danny Chan
909c3ba45e [HUDI-2632] Schema evolution for flink parquet reader (#3872) 2021-10-27 20:00:24 +08:00
Y Ethan Guo
5ed35bff83 [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module (#3741) 2021-10-22 15:58:51 -04:00
Danny Chan
aa3c4ecda5 [HUDI-2583] Refactor TestWriteCopyOnWrite test cases (#3832) 2021-10-21 12:36:41 +08:00
Danny Chan
e355ab52db [HUDI-2578] Support merging small files for flink insert operation (#3822) 2021-10-20 21:10:07 +08:00
Danny Chan
2eda3de7f9 [HUDI-2562] Embedded timeline server on JobManager (#3812) 2021-10-18 10:45:39 +08:00
Danny Chan
2c370cbae0 [HUDI-2556] Tweak some default config options for flink (#3800)
* rename write.insert.drop.duplicates to write.precombine and set it as true for COW table
* set index.global.enabled default as true
* set compaction.target_io default as 500GB
2021-10-14 19:42:56 +08:00
Danny Chan
f897e6d73e [HUDI-2551] Support DefaultHoodieRecordPayload for flink (#3792) 2021-10-14 13:46:53 +08:00
Danny Chan
abf3e3fe71 [HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787) 2021-10-14 10:36:18 +08:00
Sivabalan Narayanan
cff384d23f [HUDI-2552] Fixing some test failures to unblock broken CI master (#3793) 2021-10-13 18:44:43 -04:00
Danny Chan
ad63938890 [HUDI-2537] Fix metadata table for flink (#3774) 2021-10-10 09:30:39 +08:00
Sivabalan Narayanan
5f32162a2f [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. 
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
2021-10-06 00:17:52 -04:00
Carl-Zhou-CN
aa546554ff [HUDI-2451] On windows client with hdfs server for wrong file separator (#3687)
Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>
2021-09-26 21:51:27 +08:00
Danny Chan
31a301f0aa [HUDI-2485] Consume as mini-batch for flink stream reader (#3710) 2021-09-24 23:44:01 +08:00
Danny Chan
440525ccbb [HUDI-2483] Infer changelog mode for flink compactor (#3706) 2021-09-24 14:52:27 +08:00
Danny Chan
3354fac42f [HUDI-2449] Incremental read for Flink (#3686) 2021-09-19 09:06:46 +08:00
Danny Chan
627f20f9c5 [HUDI-2430] Make decimal compatible with hudi for flink writer (#3658) 2021-09-15 12:04:46 +08:00
Danny Chan
b30c5bdaef [HUDI-2412] Add timestamp based partitioning for flink writer (#3638) 2021-09-11 13:17:16 +08:00
Danny Chan
db2ab9a150 [HUDI-2403] Add metadata table listing for flink query source (#3618) 2021-09-08 14:52:39 +08:00
yuzhaojing
7a1bd225ca [HUDI-2376] Add pipeline for Append mode (#3573)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-09-02 16:32:40 +08:00
Danny Chan
f66e1ce9bf [HUDI-2379] Include the pending compaction file groups for flink (#3567)
streaming reader
2021-09-01 16:47:52 +08:00
Danny Chan
57668d02a0 [HUDI-2371] Improvement flink streaming reader (#3552)
- Support reading empty table
- Fix filtering by partition path
- Support reading from earliest commit
2021-08-28 20:16:54 +08:00