Danny Chan
12ff562d2b
[HUDI-1678] Row level delete for Flink sink ( #2659 )
2021-03-11 19:44:06 +08:00
Danny Chan
2fdae6835c
[HUDI-1663] Streaming read for Flink MOR table ( #2640 )
...
Supports two read modes:
* Read the full data set starting from the latest commit instant and
subsequent incremental data set
* Read data set that starts from a specified commit instant
2021-03-10 22:44:06 +08:00
satishkotha
c4a66324cd
[HUDI-1651] Fix archival of requested replacecommit ( #2622 )
2021-03-09 15:56:44 -08:00
Balajee Nagasubramaniam
d8af24d8a2
[HUDI-1635] Improvements to Hudi Test Suite ( #2628 )
2021-03-09 13:29:38 -08:00
Raymond Xu
d3a451611c
[MINOR] HoodieClientTestHarness close resources in AfterAll phase ( #2646 )
...
Parameterized test case like `org.apache.hudi.table.upgrade.TestUpgradeDowngrade#testUpgrade` incurs flakiness when org.apache.hadoop.fs.FileSystem#closeAll is invoked at BeforeEach; it should be invoked in AfterAll instead.
2021-03-08 17:36:03 +08:00
Shen Hong
8b9dea4ad9
[HUDI-1673] Replace scala.Tule2 to Pair in FlinkHoodieBloomIndex ( #2642 )
2021-03-08 14:30:34 +08:00
xiarixiaoyao
02073235c3
[HUDI-1662] Fix hive date type conversion for mor table ( #2634 )
2021-03-08 12:16:13 +08:00
Sivabalan Narayanan
5cf2f2618b
[HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer ( #2577 )
2021-03-07 16:40:40 -05:00
Raymond Xu
9437e0ddef
[MINOR] Fix import in StreamerUtil.java ( #2638 )
2021-03-07 12:37:15 -08:00
satishkotha
11ad4ed26b
[HUDI-1661] Exclude clustering commits from getExtraMetadataFromLatest API ( #2632 )
2021-03-05 13:42:19 -08:00
n3nash
f2159c4573
[HUDI-1660] Excluding compaction and clustering instants from inflight rollback ( #2631 )
2021-03-05 11:18:09 -08:00
pengzhiwei
bc883db5de
[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig ( #2596 )
2021-03-05 14:10:27 +08:00
Raymond Xu
f53bca404f
[HUDI-1655] Support custom date format and fix unsupported exception in DatePartitionPathSelector ( #2621 )
...
- Add a config to allow parsing custom date format in `DatePartitionPathSelector`. Currently it assumes date partition string in the format of `yyyy-MM-dd`.
- Fix a bug where `UnsupportedOperationException` was thrown when sort `eligibleFiles` in-place. Changed to sort it and store in a new list.
2021-03-04 21:01:51 -08:00
satishkotha
7cc75e0be2
[HUDI-1646] Provide mechanism to read uncommitted data through InputFormat ( #2611 )
2021-03-04 17:43:31 -08:00
Danny Chan
89003bc780
[HUDI-1647] Supports snapshot read for Flink ( #2613 )
2021-03-05 08:49:32 +08:00
Raymond Xu
899ae70fdb
[HUDI-1587] Add latency and freshness support ( #2541 )
...
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
Prashant Wason
f11a6c7b2d
[HUDI-1553] Configuration and metrics for the TimelineService. ( #2495 )
2021-03-02 21:58:41 -08:00
t0il3ts0ap
4fa43359cb
[MINOR] Fix default value for hoodie.deltastreamer.source.kafka.auto.reset.offsets ( #2617 )
2021-03-03 09:49:18 +08:00
ZhangChaoMing
0dde7f9185
[HUDI-1584] Modify maker file path, which should start with the target base path. ( #2539 )
2021-03-02 17:52:21 +08:00
Prashant Wason
73fa308ff0
[HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived. ( #2595 )
2021-03-01 20:31:55 -08:00
satishkotha
7a6b071647
[HUDI-1644] Do not delete older rollback instants as part of rollback. Archival can take care of removing old instants cleanly ( #2610 )
2021-03-01 09:40:00 -08:00
Sivabalan Narayanan
657e73f9b1
[HUDI-1540] Fixing commons codec dependency in bundle jars ( #2562 )
...
- Actually including `commons-codec` into the spark/utilities bundles
2021-03-01 09:34:10 -08:00
Danny Chan
7a11de1276
[HUDI-1632] Supports merge on read write mode for Flink writer ( #2593 )
...
Also supports async compaction with pluggable strategies.
2021-03-01 12:29:41 +08:00
Liulietong
be257b58c6
[Hudi-1583]: Fix bug that Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read. ( #2584 )
...
Co-authored-by: liulietong <liulietong@bytedance.com >
2021-02-26 14:43:47 -08:00
Prashant Wason
022df0d1b1
[HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap. ( #2565 )
2021-02-25 16:52:28 -08:00
Sivabalan Narayanan
9f5e8cc7c3
Fixing README for hudi test suite long running job ( #2578 )
2021-02-25 16:50:18 -08:00
liujinhui
8c2197ae5e
[HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable ( #2443 )
...
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-02-25 10:09:32 -05:00
liujinhui
617cc24ad1
[HUDI-1367] Make deltaStreamer transition from dfsSouce to kafkasouce ( #2227 )
...
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-02-25 07:08:13 -05:00
Danny Chan
06dc7c7fd8
[HUDI-1638] Some improvements to BucketAssignFunction ( #2600 )
...
- The #initializeState executes before #open, thus, the
#checkPartitionsLoaded may see null `initialPartitionsToLoad`
- Only load the existing partitions
2021-02-25 14:33:21 +08:00
Danny Chan
97864a48c1
[HUDI-1637] Avoid to rename for bucket update when there is only one flush action during a checkpoint ( #2599 )
...
Some of the object storages do not have strong read-after-write
consistency, we should promote to remove the rename operations in the
future.
2021-02-25 10:21:27 +08:00
hj2016
77ba561a6b
[HUDI-1347] Fix Hbase index to make rollback synchronous (via config) ( #2188 )
...
Co-authored-by: huangjing <huangjing@clinbrain.com >
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-02-23 20:56:58 -05:00
Raymond Xu
ab9933f206
[HUDI-1620] Add azure pipelines configs ( #2582 )
2021-02-23 16:52:41 -08:00
Ankush Kanungo
3b8d0f3b1f
[MINOR] hive sync checks for table after creating db if auto create is true ( #2591 )
2021-02-23 10:35:14 -08:00
Prashant Wason
d2f360f5dd
[MINOR] Ensure directory exists before listing all marker files. ( #2594 )
2021-02-23 08:05:59 -08:00
Shen Hong
2efd0760ac
[HUDI-1477] Support copyOnWriteTable in java client ( #2382 )
2021-02-23 20:50:55 +08:00
Danny Chan
3ceb1b4c83
[HUDI-1624] The state based index should bootstrap from existing base files ( #2581 )
2021-02-23 13:37:44 +08:00
ZhangChaoMing
43a0776c7c
[HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop. ( #2540 )
...
Co-authored-by: zhangchaoming <zhangchaoming@360.com >
2021-02-21 11:54:04 +08:00
n3nash
ffcfb58bac
[HUDI-1486] Remove inline inflight rollback in hoodie writer ( #2359 )
...
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00
Sivabalan Narayanan
c9fcf964b2
[HUDI-1315] Adding builder for HoodieTableMetaClient initialization ( #2534 )
2021-02-20 09:54:26 +08:00
satishkotha
0d91c451b0
[HUDI-1539] Fix bug in HoodieCombineRealtimeRecordReader with reading empty iterators ( #2583 )
2021-02-19 15:45:43 -08:00
Balajee Nagasubramaniam
b0010bf3b4
[HUDI-1582] Throw an exception when syncHoodieTable() fails, with RuntimeException ( #2536 )
2021-02-17 17:34:15 -08:00
Karl_Wang
9431aabfab
[HUDI-1381] Schedule compaction based on time elapsed ( #2260 )
...
- introduce configs to control how compaction is triggered
- Compaction can be triggered using time, number of delta commits and/or combinations
- Default behaviour remains the same.
2021-02-17 07:44:53 -08:00
lamber-ken
c4bbcb7f0e
[HUDI-1621] Gets the parallelism from context when init StreamWriteOperatorCoordinator ( #2579 )
2021-02-17 20:04:38 +08:00
pengzhiwei
37972071ff
[HUDI-1109] Support Spark Structured Streaming read from Hudi table ( #2485 )
2021-02-17 03:36:29 -08:00
Danny Chan
5d2491d10c
[HUDI-1598] Write as minor batches during one checkpoint interval for the new writer ( #2553 )
2021-02-17 15:24:50 +08:00
vinoyang
302bd29dab
[MINOR] Add clustering to feature list ( #2568 )
2021-02-13 07:39:14 -08:00
Raymond Xu
527175ab0b
[MINOR] Default to empty list for unset datadog tags property ( #2574 )
2021-02-13 15:52:03 +08:00
Sivabalan Narayanan
d5f202821b
Adding fixes to test suite framework. Adding clustering node and validate async operations node. ( #2400 )
2021-02-12 09:29:21 -08:00
lamber-ken
ff0e3f5669
[HUDI-1612] Fix write test flakiness in StreamWriteITCase ( #2567 )
...
* [HUDI-1612] Fix write test flakiness in StreamWriteITCase
2021-02-11 23:37:19 +08:00
teeyog
26da4f5462
[HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field ( #2431 )
2021-02-10 12:07:54 -05:00