1
0
Commit Graph

546 Commits

Author SHA1 Message Date
董可伦
6228b17a3d [MINOR] Fix typo, 'requried' corrected to 'required' (#3643) 2021-09-11 15:46:24 +08:00
董可伦
dbcf60f370 [MINOR] fix typo (#3640) 2021-09-11 15:45:49 +08:00
rmahindra123
e528dd798a [HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592)
- Fixing packaging, naming of classes
 - Use of log4j over slf4j for uniformity
- More follow-on fixes
 - Added a version to control/coordinator events.
 - Eliminated the config added to write config
 - Fixed fetching of checkpoints based on table type
 - Clean up of naming, code placement

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-09-10 18:20:26 -07:00
wangxianghu
44b9bc145e [HUDI-2411] Remove unnecessary method overriden and note (#3636) 2021-09-10 18:58:34 +08:00
Y Ethan Guo
56d08fbe70 [HUDI-2351] Extract common FS and IO utils for marker mechanism (#3529) 2021-09-09 14:45:28 -04:00
Raymond Xu
57c8113ee1 [HUDI-2408] Deprecate FunctionalTestHarness to avoid init DFS (#3628) 2021-09-09 11:29:04 -04:00
liujinhui
3c4eb60913 Add the document to the PUSHGATEWAY configuration item (#3627) 2021-09-09 15:53:58 +08:00
vinoth chandar
ea59a7ff5f [HUDI-2080] Move to ubuntu-18.04 for Azure CI (#3409)
Update Azure CI ubuntu from 16.04 to 18.04 due to 16.04 will be removed soon

Fixed some consistently failed tests

* fix TestCOWDataSourceStorage TestMORDataSourceStorage
* reset mocks

Also update readme badge



Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2021-09-07 09:44:30 -07:00
Sivabalan Narayanan
f218693f5d [MINOR] Fixing some functional tests by moving to right packages (#3596) 2021-09-06 00:07:55 -04:00
Raymond Xu
6bd3ca98d6 [HUDI-1989] Fix flakiness in TestHoodieMergeOnReadTable (#3574)
* [HUDI-1989] Refactor clustering tests for MoR table

* refactor assertion helper

* add CheckedFunction

* SparkClientFunctionalTestHarness.java

* put back original test case

* move testcases out from TestHoodieMergeOnReadTable.java

* add TestHoodieSparkMergeOnReadTableRollback.java

* use SparkClientFunctionalTestHarness

* add tag
2021-09-03 13:17:17 -07:00
yuzhaojing
7a1bd225ca [HUDI-2376] Add pipeline for Append mode (#3573)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-09-02 16:32:40 +08:00
Shawy Geng
21fd6edfe7 [HUDI-2384] Change log file size config to long (#3577) 2021-09-02 11:14:09 +08:00
rmahindra123
d59c8044f8 [HUDI-2378] Add configs for common and pre validate (#3564)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-08-30 23:28:35 -04:00
Danny Chan
a60fab3a5c [HUDI-2352] The upgrade downgrade action of flink writer should be singleton (#3531) 2021-08-25 10:56:14 +08:00
Satish M
04ede8eecf [HUDI-2262] reduce build warnings (#3481) 2021-08-24 13:06:38 -04:00
zhangyue19921010
de94787a85 [HUDI-2345] Hoodie columns sort partitioner for bulk insert (#3523)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-24 21:45:17 +08:00
Udit Mehrotra
e39d0a2f28 Keep non-conflicting names for common configs between DataSourceOptions and HoodieWriteConfig (#3511) 2021-08-20 02:42:59 -07:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
ayachi_nene
99663d370b [HUDI-2301] fix FileSliceMetrics utils bug (#3487) 2021-08-17 11:09:53 -07:00
Udit Mehrotra
3e301196bf Moving to 0.10.0-SNAPSHOT on master branch. 2021-08-14 18:51:09 -07:00
Y Ethan Guo
23dca6c237 [HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470)
- Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.
2021-08-14 20:20:23 -04:00
Y Ethan Guo
9056c68744 [HUDI-2305] Add MARKERS.type and fix marker-based rollback (#3472)
- Rollback infers the directory structure and does rollback based on the strategy used while markers were written. "write markers type" in write config is used to determine marker strategy only for new writes.
2021-08-14 08:18:49 -04:00
Prashant Wason
8eed440694 [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant. (#3210)
* [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant.

If the rolled-back instant was synced to the Metadata Table, a corresponding deltacommit with the same timestamp should have been created on the Metadata Table timeline. To ensure we can always perfomr this check, the Metadata Table instants should not be archived until their corresponding instants are present in the dataset timeline. But ensuring this requires a large number of instants to be kept on the metadata table.

In this change, the metadata table will keep atleast the number of instants that the main dataset is keeping. If the instant being rolled back was before the metadata table timeline, the code will throw an exception and the metadata table will have to be re-bootstrapped. This should be a very rare occurance and should occur only when the dataset is being repaired by rolling back multiple commits or restoring to an much older time.

* Fixed checkstyle

* Improvements from review comments.

Fixed  checkstyle
Replaced explicit null check with Option.ofNullable
Removed redundant function getSynedInstantTime

* Renamed getSyncedInstantTime and getSyncedInstantTimeForReader.

Sync is confusing so renamed to getUpdateTime() and getReaderTime().

* Removed getReaderTime which is only for testing as the same method can be accessed during testing differently without making it part of the public interface.

* Fix compilation error

* Reverting changes to HoodieMetadataFileSystemView

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-13 21:23:34 -07:00
Sivabalan Narayanan
642b1b671d [HUDI-2151] Flipping defaults (#3452) 2021-08-13 19:29:22 -04:00
Sagar Sumit
0544d70d8f [MINOR] Deprecate older configs (#3464)
Rename and deprecate props in HoodieWriteConfig

Rename and deprecate older props
2021-08-12 20:31:04 -07:00
Prashant Wason
76bc686a77 [HUDI-1292] Created a config to enable/disable syncing of metadata table. (#3427)
* [HUDI-1292] Created a config to enable/disable syncing of metadata table.

- Metadata Table should only be synced from a single pipeline to prevent conflicts.
- Skip syncing metadata table for clustering and compaction
- Renamed useFileListingMetadata

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-12 15:45:57 -07:00
Prashant Wason
b3e430f24b [HUDI-2017] Add API to set a metric in the registry. (#3084)
Registry.add() API adds the new value to existing metric value. For some use-cases We need a API to set/replace the existing value.

Metadata Table is synced in preWrite() and postWrite() functions of commit. As part of the sync, the current sizes and basefile/logfile counts are published as metrics. If we use the Registry.add() API, the count and sizes are incorrectly published as sum of the two values. This is corrected by using the Registry.set() API instead.
2021-08-11 16:47:16 -07:00
zhangyue19921010
9e8308527a [HUDI-1518] Remove the logic that delete replaced file when archive (#3310)
* remove delete replaced file when archive

* done

* remove unsed import

* remove delete replaced files when archive realted UT

* code reviewed

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-11 10:54:44 -07:00
Y Ethan Guo
4783176554 [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency (#3233)
- Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.
2021-08-11 11:48:13 -04:00
Prashant Wason
aa11989ead [HUDI-2286] Handle the case of failed deltacommit on the metadata table. (#3428)
A failed deltacommit on the metadata table will be automatically rolled back. Assuming the failed commit was "t10", the rollback will happen the next time at "t11". Post rollback, when we try to sync the dataset to the metadata table, we should look for all unsynched instants including t11. Current code ignores t11 since the latest commit timestamp on metadata table is t11 (due to rollback).
2021-08-11 07:39:48 -07:00
Sivabalan Narayanan
c9fa3cffaf [HUDI-1774] Adding support for delete_partitions to spark data source (#3437) 2021-08-11 01:03:01 -04:00
swuferhong
5448cdde7e [HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload (#3401) 2021-08-11 10:20:55 +08:00
Sivabalan Narayanan
1196736185 [HUDI-1129] Improving schema evolution support in hudi (#2927)
* Adding support to ingest records with old schema after table's schema is evolved

* Rebasing against latest master

- Trimming test file to be < 800 lines
- Renaming config names

* Addressing feedback

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-10 09:15:37 -07:00
swuferhong
21db6d7a84 [HUDI-1771] Propagate CDC format for hoodie (#3285) 2021-08-10 20:23:23 +08:00
zhangyue19921010
b4441abcf7 [HUDI-2194] Skip the latest N partitions when choosing partitions to create ClusteringPlan (#3300)
* skip from latest partitions based on hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions && 0(default means skip nothing)

* change config verison

* add ut

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-09 10:10:15 -07:00
Sagar Sumit
70b6bd485f [HUDI-1468] Support custom clustering strategies and preserve commit metadata as part of clustering (#3419)
Co-authored-by: Satish Kotha <satishkotha@uber.com>
2021-08-06 22:53:08 -04:00
Danny Chan
20feb1a897 [HUDI-2278] Use INT64 timestamp with precision 3 for flink parquet writer (#3414) 2021-08-06 11:06:21 +08:00
Danny Chan
b7586a5632 [HUDI-2274] Allows INSERT duplicates for Flink MOR table (#3403) 2021-08-06 10:30:52 +08:00
Sivabalan Narayanan
1df5ded433 [HUDI-2273] Migrating some long running tests to functional test profile (#3398) 2021-08-04 19:08:50 -04:00
yuzhaojing
b8b9d6db83 [HUDI-2087] Support Append only in Flink stream (#3390)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-08-04 17:53:20 +08:00
Danny Chan
02331fc223 [HUDI-2258] Metadata table for flink (#3381) 2021-08-04 10:54:55 +08:00
wenningd
91bb0d1318 [HUDI-2255] Refactor Datasource options (#3373)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-08-03 17:50:30 -07:00
Udit Mehrotra
1ff2d3459a [HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL (#2893) 2021-08-03 14:47:40 -07:00
satishkotha
826a04d142 [HUDI-2072] Add pre-commit validator framework (#3153)
* [HUDI-2072] Add pre-commit validator framework

* trigger Travis rebuild
2021-08-03 12:07:45 -07:00
Danny Chan
bec23bda50 [HUDI-2269] Release the disk map resource for flink streaming reader (#3384) 2021-08-03 13:55:35 +08:00
Sivabalan Narayanan
fe508376fa [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315) 2021-08-02 09:45:09 -04:00
Gary Li
6353fc865f [HUDI-2218] Fix missing HoodieWriteStat in HoodieCreateHandle (#3341) 2021-07-30 02:36:57 -07:00
Danny Chan
c4e45a0010 [HUDI-2254] Builtin sort operator for flink bulk insert (#3372) 2021-07-30 16:58:11 +08:00
Shawy Geng
44e41dc9bb [HUDI-2117] Unpersist the input rdd after the commit is completed to … (#3207)
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-07-29 08:16:58 -07:00
pengzhiwei
bbadac7de1 [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write (#2296) 2021-07-28 21:30:18 -07:00