1
0
Commit Graph

189 Commits

Author SHA1 Message Date
Ankush Kanungo
4f991ee352 [HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602) 2021-09-11 20:27:40 -07:00
董可伦
6228b17a3d [MINOR] Fix typo, 'requried' corrected to 'required' (#3643) 2021-09-11 15:46:24 +08:00
rmahindra123
e528dd798a [HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592)
- Fixing packaging, naming of classes
 - Use of log4j over slf4j for uniformity
- More follow-on fixes
 - Added a version to control/coordinator events.
 - Eliminated the config added to write config
 - Fixed fetching of checkpoints based on table type
 - Clean up of naming, code placement

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-09-10 18:20:26 -07:00
wangxianghu
44b9bc145e [HUDI-2411] Remove unnecessary method overriden and note (#3636) 2021-09-10 18:58:34 +08:00
Y Ethan Guo
56d08fbe70 [HUDI-2351] Extract common FS and IO utils for marker mechanism (#3529) 2021-09-09 14:45:28 -04:00
liujinhui
3c4eb60913 Add the document to the PUSHGATEWAY configuration item (#3627) 2021-09-09 15:53:58 +08:00
vinoth chandar
ea59a7ff5f [HUDI-2080] Move to ubuntu-18.04 for Azure CI (#3409)
Update Azure CI ubuntu from 16.04 to 18.04 due to 16.04 will be removed soon

Fixed some consistently failed tests

* fix TestCOWDataSourceStorage TestMORDataSourceStorage
* reset mocks

Also update readme badge



Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2021-09-07 09:44:30 -07:00
Raymond Xu
6bd3ca98d6 [HUDI-1989] Fix flakiness in TestHoodieMergeOnReadTable (#3574)
* [HUDI-1989] Refactor clustering tests for MoR table

* refactor assertion helper

* add CheckedFunction

* SparkClientFunctionalTestHarness.java

* put back original test case

* move testcases out from TestHoodieMergeOnReadTable.java

* add TestHoodieSparkMergeOnReadTableRollback.java

* use SparkClientFunctionalTestHarness

* add tag
2021-09-03 13:17:17 -07:00
Shawy Geng
21fd6edfe7 [HUDI-2384] Change log file size config to long (#3577) 2021-09-02 11:14:09 +08:00
rmahindra123
d59c8044f8 [HUDI-2378] Add configs for common and pre validate (#3564)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-08-30 23:28:35 -04:00
zhangyue19921010
de94787a85 [HUDI-2345] Hoodie columns sort partitioner for bulk insert (#3523)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-24 21:45:17 +08:00
Udit Mehrotra
e39d0a2f28 Keep non-conflicting names for common configs between DataSourceOptions and HoodieWriteConfig (#3511) 2021-08-20 02:42:59 -07:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
ayachi_nene
99663d370b [HUDI-2301] fix FileSliceMetrics utils bug (#3487) 2021-08-17 11:09:53 -07:00
Udit Mehrotra
3e301196bf Moving to 0.10.0-SNAPSHOT on master branch. 2021-08-14 18:51:09 -07:00
Y Ethan Guo
23dca6c237 [HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470)
- Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.
2021-08-14 20:20:23 -04:00
Y Ethan Guo
9056c68744 [HUDI-2305] Add MARKERS.type and fix marker-based rollback (#3472)
- Rollback infers the directory structure and does rollback based on the strategy used while markers were written. "write markers type" in write config is used to determine marker strategy only for new writes.
2021-08-14 08:18:49 -04:00
Prashant Wason
8eed440694 [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant. (#3210)
* [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant.

If the rolled-back instant was synced to the Metadata Table, a corresponding deltacommit with the same timestamp should have been created on the Metadata Table timeline. To ensure we can always perfomr this check, the Metadata Table instants should not be archived until their corresponding instants are present in the dataset timeline. But ensuring this requires a large number of instants to be kept on the metadata table.

In this change, the metadata table will keep atleast the number of instants that the main dataset is keeping. If the instant being rolled back was before the metadata table timeline, the code will throw an exception and the metadata table will have to be re-bootstrapped. This should be a very rare occurance and should occur only when the dataset is being repaired by rolling back multiple commits or restoring to an much older time.

* Fixed checkstyle

* Improvements from review comments.

Fixed  checkstyle
Replaced explicit null check with Option.ofNullable
Removed redundant function getSynedInstantTime

* Renamed getSyncedInstantTime and getSyncedInstantTimeForReader.

Sync is confusing so renamed to getUpdateTime() and getReaderTime().

* Removed getReaderTime which is only for testing as the same method can be accessed during testing differently without making it part of the public interface.

* Fix compilation error

* Reverting changes to HoodieMetadataFileSystemView

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-13 21:23:34 -07:00
Sivabalan Narayanan
642b1b671d [HUDI-2151] Flipping defaults (#3452) 2021-08-13 19:29:22 -04:00
Sagar Sumit
0544d70d8f [MINOR] Deprecate older configs (#3464)
Rename and deprecate props in HoodieWriteConfig

Rename and deprecate older props
2021-08-12 20:31:04 -07:00
Prashant Wason
76bc686a77 [HUDI-1292] Created a config to enable/disable syncing of metadata table. (#3427)
* [HUDI-1292] Created a config to enable/disable syncing of metadata table.

- Metadata Table should only be synced from a single pipeline to prevent conflicts.
- Skip syncing metadata table for clustering and compaction
- Renamed useFileListingMetadata

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-12 15:45:57 -07:00
zhangyue19921010
9e8308527a [HUDI-1518] Remove the logic that delete replaced file when archive (#3310)
* remove delete replaced file when archive

* done

* remove unsed import

* remove delete replaced files when archive realted UT

* code reviewed

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-11 10:54:44 -07:00
Y Ethan Guo
4783176554 [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency (#3233)
- Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.
2021-08-11 11:48:13 -04:00
swuferhong
5448cdde7e [HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload (#3401) 2021-08-11 10:20:55 +08:00
swuferhong
21db6d7a84 [HUDI-1771] Propagate CDC format for hoodie (#3285) 2021-08-10 20:23:23 +08:00
zhangyue19921010
b4441abcf7 [HUDI-2194] Skip the latest N partitions when choosing partitions to create ClusteringPlan (#3300)
* skip from latest partitions based on hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions && 0(default means skip nothing)

* change config verison

* add ut

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-09 10:10:15 -07:00
Sagar Sumit
70b6bd485f [HUDI-1468] Support custom clustering strategies and preserve commit metadata as part of clustering (#3419)
Co-authored-by: Satish Kotha <satishkotha@uber.com>
2021-08-06 22:53:08 -04:00
Danny Chan
02331fc223 [HUDI-2258] Metadata table for flink (#3381) 2021-08-04 10:54:55 +08:00
wenningd
91bb0d1318 [HUDI-2255] Refactor Datasource options (#3373)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-08-03 17:50:30 -07:00
satishkotha
826a04d142 [HUDI-2072] Add pre-commit validator framework (#3153)
* [HUDI-2072] Add pre-commit validator framework

* trigger Travis rebuild
2021-08-03 12:07:45 -07:00
Danny Chan
bec23bda50 [HUDI-2269] Release the disk map resource for flink streaming reader (#3384) 2021-08-03 13:55:35 +08:00
Sivabalan Narayanan
fe508376fa [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315) 2021-08-02 09:45:09 -04:00
Gary Li
6353fc865f [HUDI-2218] Fix missing HoodieWriteStat in HoodieCreateHandle (#3341) 2021-07-30 02:36:57 -07:00
Shawy Geng
44e41dc9bb [HUDI-2117] Unpersist the input rdd after the commit is completed to … (#3207)
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-07-29 08:16:58 -07:00
pengzhiwei
bbadac7de1 [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write (#2296) 2021-07-28 21:30:18 -07:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
rmahindra123
a14b19fdd5 [HUDI-1241] Automate the generation of configs webpage as configs are added to Hudi repo (#3302) 2021-07-23 21:33:34 -07:00
Xuedong Luan
71e14cf866 [HUDI-2213] Remove unnecessary parameter for HoodieMetrics constructor and fix NPE in UT (#3333) 2021-07-23 19:57:35 +08:00
Xuedong Luan
6d592c5896 [HUDI-2211] Fix NullPointerException in TestHoodieConsoleMetrics (#3331) 2021-07-23 11:22:54 +08:00
pengzhiwei
5a2f3d439e [HUDI-2139] MergeInto MOR Table May Result InCorrect Result (#3230) 2021-07-23 10:19:43 +08:00
Samrat
a086d255c8 [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer (#3184) 2021-07-19 21:49:43 -04:00
Sivabalan Narayanan
d5026e9a24 [HUDI-2161] Adding support to disable meta columns with bulk insert operation (#3247) 2021-07-19 20:43:48 -04:00
yuzhao.cyz
50c2b76d72 Revert "[HUDI-2087] Support Append only in Flink stream (#3252)"
This reverts commit 783c9cb3
2021-07-16 21:36:27 +08:00
liujinhui
3b264e80d9 [HUDI-1633] Make callback return HoodieWriteStat (#2445)
* CALLBACK add partitionPath

* callback can send hoodieWriteStat

* add ApiMaturityLevel
2021-07-16 12:37:07 +08:00
Jintao Guan
38cd74b563 [MINOR] Allow users to choose ORC as base file format in Spark SQL (#3279) 2021-07-16 12:24:41 +08:00
rmahindra123
d024439764 [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map (#3128) 2021-07-14 22:57:38 -04:00
vinoth chandar
75040ee9e5 [HUDI-2149] Ensure and Audit docs for every configuration class in the codebase (#3272)
- Added docs when missing
 - Rewrote, reworded as needed
 - Made couple more classes extend HoodieConfig
2021-07-14 10:56:08 -07:00
zhangyue19921010
c1810f210e [MINOR] Correct the logs of enable/not-enable async cleaner service. (#3271)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-07-15 00:08:29 +08:00
Jintao Guan
2debb9b3ed [HUDI-1828] Update unit tests to support ORC as the base file format (#3237) 2021-07-15 00:05:42 +08:00