1
0
Commit Graph

1398 Commits

Author SHA1 Message Date
Danny Chan
12ff562d2b [HUDI-1678] Row level delete for Flink sink (#2659) 2021-03-11 19:44:06 +08:00
Danny Chan
2fdae6835c [HUDI-1663] Streaming read for Flink MOR table (#2640)
Supports two read modes:
* Read the full data set starting from the latest commit instant and
  subsequent incremental data set
* Read data set that starts from a specified commit instant
2021-03-10 22:44:06 +08:00
satishkotha
c4a66324cd [HUDI-1651] Fix archival of requested replacecommit (#2622) 2021-03-09 15:56:44 -08:00
Balajee Nagasubramaniam
d8af24d8a2 [HUDI-1635] Improvements to Hudi Test Suite (#2628) 2021-03-09 13:29:38 -08:00
Raymond Xu
d3a451611c [MINOR] HoodieClientTestHarness close resources in AfterAll phase (#2646)
Parameterized test case like `org.apache.hudi.table.upgrade.TestUpgradeDowngrade#testUpgrade` incurs flakiness when org.apache.hadoop.fs.FileSystem#closeAll is invoked at BeforeEach; it should be invoked in AfterAll instead.
2021-03-08 17:36:03 +08:00
Shen Hong
8b9dea4ad9 [HUDI-1673] Replace scala.Tule2 to Pair in FlinkHoodieBloomIndex (#2642) 2021-03-08 14:30:34 +08:00
xiarixiaoyao
02073235c3 [HUDI-1662] Fix hive date type conversion for mor table (#2634) 2021-03-08 12:16:13 +08:00
Sivabalan Narayanan
5cf2f2618b [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer (#2577) 2021-03-07 16:40:40 -05:00
Raymond Xu
9437e0ddef [MINOR] Fix import in StreamerUtil.java (#2638) 2021-03-07 12:37:15 -08:00
satishkotha
11ad4ed26b [HUDI-1661] Exclude clustering commits from getExtraMetadataFromLatest API (#2632) 2021-03-05 13:42:19 -08:00
n3nash
f2159c4573 [HUDI-1660] Excluding compaction and clustering instants from inflight rollback (#2631) 2021-03-05 11:18:09 -08:00
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
Raymond Xu
f53bca404f [HUDI-1655] Support custom date format and fix unsupported exception in DatePartitionPathSelector (#2621)
- Add a config to allow parsing custom date format in `DatePartitionPathSelector`. Currently it assumes date partition string in the format of `yyyy-MM-dd`.
- Fix a bug where `UnsupportedOperationException` was thrown when sort `eligibleFiles` in-place. Changed to sort it and store in a new list.
2021-03-04 21:01:51 -08:00
satishkotha
7cc75e0be2 [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat (#2611) 2021-03-04 17:43:31 -08:00
Danny Chan
89003bc780 [HUDI-1647] Supports snapshot read for Flink (#2613) 2021-03-05 08:49:32 +08:00
Raymond Xu
899ae70fdb [HUDI-1587] Add latency and freshness support (#2541)
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
Prashant Wason
f11a6c7b2d [HUDI-1553] Configuration and metrics for the TimelineService. (#2495) 2021-03-02 21:58:41 -08:00
t0il3ts0ap
4fa43359cb [MINOR] Fix default value for hoodie.deltastreamer.source.kafka.auto.reset.offsets (#2617) 2021-03-03 09:49:18 +08:00
ZhangChaoMing
0dde7f9185 [HUDI-1584] Modify maker file path, which should start with the target base path. (#2539) 2021-03-02 17:52:21 +08:00
Prashant Wason
73fa308ff0 [HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived. (#2595) 2021-03-01 20:31:55 -08:00
satishkotha
7a6b071647 [HUDI-1644] Do not delete older rollback instants as part of rollback. Archival can take care of removing old instants cleanly (#2610) 2021-03-01 09:40:00 -08:00
Sivabalan Narayanan
657e73f9b1 [HUDI-1540] Fixing commons codec dependency in bundle jars (#2562)
- Actually including `commons-codec` into the spark/utilities bundles
2021-03-01 09:34:10 -08:00
Danny Chan
7a11de1276 [HUDI-1632] Supports merge on read write mode for Flink writer (#2593)
Also supports async compaction with pluggable strategies.
2021-03-01 12:29:41 +08:00
Liulietong
be257b58c6 [Hudi-1583]: Fix bug that Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read. (#2584)
Co-authored-by: liulietong <liulietong@bytedance.com>
2021-02-26 14:43:47 -08:00
Prashant Wason
022df0d1b1 [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap. (#2565) 2021-02-25 16:52:28 -08:00
Sivabalan Narayanan
9f5e8cc7c3 Fixing README for hudi test suite long running job (#2578) 2021-02-25 16:50:18 -08:00
liujinhui
8c2197ae5e [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable (#2443)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 10:09:32 -05:00
liujinhui
617cc24ad1 [HUDI-1367] Make deltaStreamer transition from dfsSouce to kafkasouce (#2227)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 07:08:13 -05:00
Danny Chan
06dc7c7fd8 [HUDI-1638] Some improvements to BucketAssignFunction (#2600)
- The #initializeState executes before #open, thus, the
  #checkPartitionsLoaded may see null `initialPartitionsToLoad`
  - Only load the existing partitions
2021-02-25 14:33:21 +08:00
Danny Chan
97864a48c1 [HUDI-1637] Avoid to rename for bucket update when there is only one flush action during a checkpoint (#2599)
Some of the object storages do not have strong read-after-write
consistency, we should promote to remove the rename operations in the
future.
2021-02-25 10:21:27 +08:00
hj2016
77ba561a6b [HUDI-1347] Fix Hbase index to make rollback synchronous (via config) (#2188)
Co-authored-by: huangjing <huangjing@clinbrain.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-23 20:56:58 -05:00
Raymond Xu
ab9933f206 [HUDI-1620] Add azure pipelines configs (#2582) 2021-02-23 16:52:41 -08:00
Ankush Kanungo
3b8d0f3b1f [MINOR] hive sync checks for table after creating db if auto create is true (#2591) 2021-02-23 10:35:14 -08:00
Prashant Wason
d2f360f5dd [MINOR] Ensure directory exists before listing all marker files. (#2594) 2021-02-23 08:05:59 -08:00
Shen Hong
2efd0760ac [HUDI-1477] Support copyOnWriteTable in java client (#2382) 2021-02-23 20:50:55 +08:00
Danny Chan
3ceb1b4c83 [HUDI-1624] The state based index should bootstrap from existing base files (#2581) 2021-02-23 13:37:44 +08:00
ZhangChaoMing
43a0776c7c [HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop. (#2540)
Co-authored-by: zhangchaoming <zhangchaoming@360.com>
2021-02-21 11:54:04 +08:00
n3nash
ffcfb58bac [HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359)
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00
Sivabalan Narayanan
c9fcf964b2 [HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534) 2021-02-20 09:54:26 +08:00
satishkotha
0d91c451b0 [HUDI-1539] Fix bug in HoodieCombineRealtimeRecordReader with reading empty iterators (#2583) 2021-02-19 15:45:43 -08:00
Balajee Nagasubramaniam
b0010bf3b4 [HUDI-1582] Throw an exception when syncHoodieTable() fails, with RuntimeException (#2536) 2021-02-17 17:34:15 -08:00
Karl_Wang
9431aabfab [HUDI-1381] Schedule compaction based on time elapsed (#2260)
- introduce configs to control how compaction is triggered
- Compaction can be triggered using time, number of delta commits and/or combinations
- Default behaviour remains the same.
2021-02-17 07:44:53 -08:00
lamber-ken
c4bbcb7f0e [HUDI-1621] Gets the parallelism from context when init StreamWriteOperatorCoordinator (#2579) 2021-02-17 20:04:38 +08:00
pengzhiwei
37972071ff [HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485) 2021-02-17 03:36:29 -08:00
Danny Chan
5d2491d10c [HUDI-1598] Write as minor batches during one checkpoint interval for the new writer (#2553) 2021-02-17 15:24:50 +08:00
vinoyang
302bd29dab [MINOR] Add clustering to feature list (#2568) 2021-02-13 07:39:14 -08:00
Raymond Xu
527175ab0b [MINOR] Default to empty list for unset datadog tags property (#2574) 2021-02-13 15:52:03 +08:00
Sivabalan Narayanan
d5f202821b Adding fixes to test suite framework. Adding clustering node and validate async operations node. (#2400) 2021-02-12 09:29:21 -08:00
lamber-ken
ff0e3f5669 [HUDI-1612] Fix write test flakiness in StreamWriteITCase (#2567)
* [HUDI-1612] Fix write test flakiness in StreamWriteITCase
2021-02-11 23:37:19 +08:00
teeyog
26da4f5462 [HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field (#2431) 2021-02-10 12:07:54 -05:00