1
0
Commit Graph

1387 Commits

Author SHA1 Message Date
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
Raymond Xu
f53bca404f [HUDI-1655] Support custom date format and fix unsupported exception in DatePartitionPathSelector (#2621)
- Add a config to allow parsing custom date format in `DatePartitionPathSelector`. Currently it assumes date partition string in the format of `yyyy-MM-dd`.
- Fix a bug where `UnsupportedOperationException` was thrown when sort `eligibleFiles` in-place. Changed to sort it and store in a new list.
2021-03-04 21:01:51 -08:00
satishkotha
7cc75e0be2 [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat (#2611) 2021-03-04 17:43:31 -08:00
Danny Chan
89003bc780 [HUDI-1647] Supports snapshot read for Flink (#2613) 2021-03-05 08:49:32 +08:00
Raymond Xu
899ae70fdb [HUDI-1587] Add latency and freshness support (#2541)
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
Prashant Wason
f11a6c7b2d [HUDI-1553] Configuration and metrics for the TimelineService. (#2495) 2021-03-02 21:58:41 -08:00
t0il3ts0ap
4fa43359cb [MINOR] Fix default value for hoodie.deltastreamer.source.kafka.auto.reset.offsets (#2617) 2021-03-03 09:49:18 +08:00
ZhangChaoMing
0dde7f9185 [HUDI-1584] Modify maker file path, which should start with the target base path. (#2539) 2021-03-02 17:52:21 +08:00
Prashant Wason
73fa308ff0 [HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived. (#2595) 2021-03-01 20:31:55 -08:00
satishkotha
7a6b071647 [HUDI-1644] Do not delete older rollback instants as part of rollback. Archival can take care of removing old instants cleanly (#2610) 2021-03-01 09:40:00 -08:00
Sivabalan Narayanan
657e73f9b1 [HUDI-1540] Fixing commons codec dependency in bundle jars (#2562)
- Actually including `commons-codec` into the spark/utilities bundles
2021-03-01 09:34:10 -08:00
Danny Chan
7a11de1276 [HUDI-1632] Supports merge on read write mode for Flink writer (#2593)
Also supports async compaction with pluggable strategies.
2021-03-01 12:29:41 +08:00
Liulietong
be257b58c6 [Hudi-1583]: Fix bug that Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read. (#2584)
Co-authored-by: liulietong <liulietong@bytedance.com>
2021-02-26 14:43:47 -08:00
Prashant Wason
022df0d1b1 [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap. (#2565) 2021-02-25 16:52:28 -08:00
Sivabalan Narayanan
9f5e8cc7c3 Fixing README for hudi test suite long running job (#2578) 2021-02-25 16:50:18 -08:00
liujinhui
8c2197ae5e [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable (#2443)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 10:09:32 -05:00
liujinhui
617cc24ad1 [HUDI-1367] Make deltaStreamer transition from dfsSouce to kafkasouce (#2227)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 07:08:13 -05:00
Danny Chan
06dc7c7fd8 [HUDI-1638] Some improvements to BucketAssignFunction (#2600)
- The #initializeState executes before #open, thus, the
  #checkPartitionsLoaded may see null `initialPartitionsToLoad`
  - Only load the existing partitions
2021-02-25 14:33:21 +08:00
Danny Chan
97864a48c1 [HUDI-1637] Avoid to rename for bucket update when there is only one flush action during a checkpoint (#2599)
Some of the object storages do not have strong read-after-write
consistency, we should promote to remove the rename operations in the
future.
2021-02-25 10:21:27 +08:00
hj2016
77ba561a6b [HUDI-1347] Fix Hbase index to make rollback synchronous (via config) (#2188)
Co-authored-by: huangjing <huangjing@clinbrain.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-23 20:56:58 -05:00
Raymond Xu
ab9933f206 [HUDI-1620] Add azure pipelines configs (#2582) 2021-02-23 16:52:41 -08:00
Ankush Kanungo
3b8d0f3b1f [MINOR] hive sync checks for table after creating db if auto create is true (#2591) 2021-02-23 10:35:14 -08:00
Prashant Wason
d2f360f5dd [MINOR] Ensure directory exists before listing all marker files. (#2594) 2021-02-23 08:05:59 -08:00
Shen Hong
2efd0760ac [HUDI-1477] Support copyOnWriteTable in java client (#2382) 2021-02-23 20:50:55 +08:00
Danny Chan
3ceb1b4c83 [HUDI-1624] The state based index should bootstrap from existing base files (#2581) 2021-02-23 13:37:44 +08:00
ZhangChaoMing
43a0776c7c [HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop. (#2540)
Co-authored-by: zhangchaoming <zhangchaoming@360.com>
2021-02-21 11:54:04 +08:00
n3nash
ffcfb58bac [HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359)
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00
Sivabalan Narayanan
c9fcf964b2 [HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534) 2021-02-20 09:54:26 +08:00
satishkotha
0d91c451b0 [HUDI-1539] Fix bug in HoodieCombineRealtimeRecordReader with reading empty iterators (#2583) 2021-02-19 15:45:43 -08:00
Balajee Nagasubramaniam
b0010bf3b4 [HUDI-1582] Throw an exception when syncHoodieTable() fails, with RuntimeException (#2536) 2021-02-17 17:34:15 -08:00
Karl_Wang
9431aabfab [HUDI-1381] Schedule compaction based on time elapsed (#2260)
- introduce configs to control how compaction is triggered
- Compaction can be triggered using time, number of delta commits and/or combinations
- Default behaviour remains the same.
2021-02-17 07:44:53 -08:00
lamber-ken
c4bbcb7f0e [HUDI-1621] Gets the parallelism from context when init StreamWriteOperatorCoordinator (#2579) 2021-02-17 20:04:38 +08:00
pengzhiwei
37972071ff [HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485) 2021-02-17 03:36:29 -08:00
Danny Chan
5d2491d10c [HUDI-1598] Write as minor batches during one checkpoint interval for the new writer (#2553) 2021-02-17 15:24:50 +08:00
vinoyang
302bd29dab [MINOR] Add clustering to feature list (#2568) 2021-02-13 07:39:14 -08:00
Raymond Xu
527175ab0b [MINOR] Default to empty list for unset datadog tags property (#2574) 2021-02-13 15:52:03 +08:00
Sivabalan Narayanan
d5f202821b Adding fixes to test suite framework. Adding clustering node and validate async operations node. (#2400) 2021-02-12 09:29:21 -08:00
lamber-ken
ff0e3f5669 [HUDI-1612] Fix write test flakiness in StreamWriteITCase (#2567)
* [HUDI-1612] Fix write test flakiness in StreamWriteITCase
2021-02-11 23:37:19 +08:00
teeyog
26da4f5462 [HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field (#2431) 2021-02-10 12:07:54 -05:00
vinoyang
a2f85d90de [MINOR] Fix the wrong comment for HoodieJavaWriteClientExample (#2559) 2021-02-09 10:33:34 -08:00
Gary Li
7a98b1c878 [HUDI-1603] fix DefaultHoodieRecordPayload serialization failure (#2556) 2021-02-09 10:53:45 -05:00
Sun Ke
c30481f4b0 [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation (#2483)
Co-authored-by: sunke.03 <sunke.03@bytedance.com>
2021-02-07 21:47:01 -08:00
Danny Chan
4c5b6923cc [HUDI-1557] Make Flink write pipeline write task scalable (#2506)
This is the #step 2 of RFC-24:
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal

This PR introduce a BucketAssigner that assigns bucket ID (partition
path & fileID) for each stream record.

There is no need to look up index and partition the records anymore in
the following pipeline for these records,
we actually decide the write target location before the write and each
record computes its location when the BucketAssigner receives it, thus,
the indexing is with streaming style.

Computing locations for a batch of records all at a time is resource
consuming so a pressure to the engine,
we should avoid that in streaming system.
2021-02-06 22:03:52 +08:00
ZhangChaoMing
291f92069e [MINOR] Fix wrong logic for checking state condition (#2524) 2021-02-06 16:40:31 +08:00
n3nash
b2c47a24be [HUDI-1589] Fix Rollback Metadata AVRO backwards incompatiblity (#2543) 2021-02-05 16:03:34 -08:00
Sivabalan Narayanan
b5d4a046bb [HUDI-1571] Adding commit_show_records_info to display record sizes for commit (#2514) 2021-02-05 07:53:24 -05:00
hiscat
b51b3a39a8 [HUDI-1420] HoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator (#2526)
* Fix HUDI-1420

FIX https://issues.apache.org/jira/browse/HUDI-1420

* fix(hudi-common): fix HUDI-1420 HoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator

Co-authored-by: 谢波 <xiebo1@yonghui.cn>
2021-02-05 16:24:35 +08:00
Sivabalan Narayanan
4a5683d54a [MINOR] Fixing the default value for source ordering field for payload config (#2516) 2021-02-04 08:43:03 -05:00
wangxianghu
647e9faf25 [HUDI-1547] CI intermittent failure: TestJsonStringToHoodieRecordMapF… (#2521) 2021-02-04 11:20:01 +08:00
Volodymyr Burenin
17802569fd [HUDI-1538] Try to init class trying different signatures instead of checking its name (#2476)
* [HUDI-1538] Try to init class trying different signatures instead of checking its name.

* Removed unused imports

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
2021-02-03 12:29:08 -08:00