lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Danny Chan	abf3e3fe71	[HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787 )	2021-10-14 10:36:18 +08:00
Sivabalan Narayanan	cff384d23f	[HUDI-2552] Fixing some test failures to unblock broken CI master (#3793 )	2021-10-13 18:44:43 -04:00
董可伦	48a3906ccc	[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764 )	2021-10-11 14:07:34 -04:00
Roc Marshal	f14d4e65e7	[HUDI-2540] Fixed wrong validation for metadataTableEnabled in HoodieTable (#3781 )	2021-10-11 13:58:33 -04:00
Ilias Antoniou	ceace1c653	[HUDI-2496] Insert duplicate records when precombined is deactivated for "insert" operation (#3740 )	2021-10-10 21:33:16 -04:00
Danny Chan	ad63938890	[HUDI-2537] Fix metadata table for flink (#3774 )	2021-10-10 09:30:39 +08:00
Y Ethan Guo	2e152177fb	[HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module (#3743 )	2021-10-06 20:20:41 -04:00
Yann Byron	e91e611afb	[HUDI-2456] support 'show partitions' sql (#3693 )	2021-10-06 15:46:49 +08:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Y Ethan Guo	46808dcb1f	[HUDI-2497] Refactor clean and restore actions in hudi-client module (#3734 )	2021-09-30 18:20:25 -04:00
Sivabalan Narayanan	f0585facd6	[HUDI-2474] Refreshing timeline for every operation in Hudi when metadata is enabled (#3698 )	2021-09-28 05:16:52 -04:00
Carl-Zhou-CN	aa546554ff	[HUDI-2451] On windows client with hdfs server for wrong file separator (#3687 ) Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>	2021-09-26 21:51:27 +08:00
Shawy Geng	06c2cc2c8b	[HUDI-2385] Make parquet dictionary encoding configurable (#3578 ) Co-authored-by: leesf <leesf@apache.org>	2021-09-24 13:33:34 +08:00
Sivabalan Narayanan	5091ab7311	[HUDI-2444] Fixing delete files corner cases wrt cleaning and rollback when applying changes to metadata (#3678 )	2021-09-20 11:05:31 -04:00
liujinhui	61d0096088	[HUDI-2434] Make periodSeconds of GraphiteReporter configurable (#3667 )	2021-09-17 19:39:55 +08:00
vinoth chandar	57d5da68aa	[HUDI-2330][HUDI-2335] Adding support for merge-on-read tables (#3679 ) - Inserts go into logs, hashed by Kafka and Hudi partitions - Fixed issues with the setupKafka script - Bumped up the default commit interval to 300 seconds - Minor renaming	2021-09-16 15:24:34 -07:00
Sivabalan Narayanan	b8dad628e5	[HUDI-2422] Adding rollback plan and rollback requested instant (#3651 ) - This patch introduces rollback plan and rollback.requested instant. Rollback will be done in two phases, namely rollback plan and rollback action. In planning, we prepare the rollback plan and serialize it to rollback.requested. In the rollback action phase, we fetch details from the plan and just delete the files as per the plan. This will ensure final rollback commit metadata will contain all files that got rolled back even if rollback failed midway and retried again.	2021-09-16 11:16:06 -04:00
liujinhui	2791fb9a96	[HUDI-2423] Separate some config logic from HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig (#3652 )	2021-09-16 15:08:10 +08:00
zhangyue19921010	2d5ac55195	[HUDI-2355][Bug]Archive service executed after cleaner finished. (#3545 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-09-15 19:00:04 -04:00
Y Ethan Guo	916f12b7dd	[HUDI-2433] Refactor rollback actions in hudi-client module (#3664 )	2021-09-15 18:52:43 -04:00
Ankush Kanungo	4f991ee352	[HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602 )	2021-09-11 20:27:40 -07:00
董可伦	6228b17a3d	[MINOR] Fix typo, 'requried' corrected to 'required' (#3643 )	2021-09-11 15:46:24 +08:00
rmahindra123	e528dd798a	[HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592 ) - Fixing packaging, naming of classes - Use of log4j over slf4j for uniformity - More follow-on fixes - Added a version to control/coordinator events. - Eliminated the config added to write config - Fixed fetching of checkpoints based on table type - Clean up of naming, code placement Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-09-10 18:20:26 -07:00
wangxianghu	44b9bc145e	[HUDI-2411] Remove unnecessary method overriden and note (#3636 )	2021-09-10 18:58:34 +08:00
Y Ethan Guo	56d08fbe70	[HUDI-2351] Extract common FS and IO utils for marker mechanism (#3529 )	2021-09-09 14:45:28 -04:00
liujinhui	3c4eb60913	Add the document to the PUSHGATEWAY configuration item (#3627 )	2021-09-09 15:53:58 +08:00
vinoth chandar	ea59a7ff5f	[HUDI-2080] Move to ubuntu-18.04 for Azure CI (#3409 ) Update Azure CI ubuntu from 16.04 to 18.04 due to 16.04 will be removed soon Fixed some consistently failed tests * fix TestCOWDataSourceStorage TestMORDataSourceStorage * reset mocks Also update readme badge Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2021-09-07 09:44:30 -07:00
Raymond Xu	6bd3ca98d6	[HUDI-1989] Fix flakiness in TestHoodieMergeOnReadTable (#3574 ) * [HUDI-1989] Refactor clustering tests for MoR table * refactor assertion helper * add CheckedFunction * SparkClientFunctionalTestHarness.java * put back original test case * move testcases out from TestHoodieMergeOnReadTable.java * add TestHoodieSparkMergeOnReadTableRollback.java * use SparkClientFunctionalTestHarness * add tag	2021-09-03 13:17:17 -07:00
Shawy Geng	21fd6edfe7	[HUDI-2384] Change log file size config to long (#3577 )	2021-09-02 11:14:09 +08:00
rmahindra123	d59c8044f8	[HUDI-2378] Add configs for common and pre validate (#3564 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-08-30 23:28:35 -04:00
zhangyue19921010	de94787a85	[HUDI-2345] Hoodie columns sort partitioner for bulk insert (#3523 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-24 21:45:17 +08:00
Udit Mehrotra	e39d0a2f28	Keep non-conflicting names for common configs between DataSourceOptions and HoodieWriteConfig (#3511 )	2021-08-20 02:42:59 -07:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
ayachi_nene	99663d370b	[HUDI-2301] fix FileSliceMetrics utils bug (#3487 )	2021-08-17 11:09:53 -07:00
Y Ethan Guo	23dca6c237	[HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470 ) - Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.	2021-08-14 20:20:23 -04:00
Y Ethan Guo	9056c68744	[HUDI-2305] Add MARKERS.type and fix marker-based rollback (#3472 ) - Rollback infers the directory structure and does rollback based on the strategy used while markers were written. "write markers type" in write config is used to determine marker strategy only for new writes.	2021-08-14 08:18:49 -04:00
Prashant Wason	8eed440694	[HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant. (#3210 ) * [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant. If the rolled-back instant was synced to the Metadata Table, a corresponding deltacommit with the same timestamp should have been created on the Metadata Table timeline. To ensure we can always perfomr this check, the Metadata Table instants should not be archived until their corresponding instants are present in the dataset timeline. But ensuring this requires a large number of instants to be kept on the metadata table. In this change, the metadata table will keep atleast the number of instants that the main dataset is keeping. If the instant being rolled back was before the metadata table timeline, the code will throw an exception and the metadata table will have to be re-bootstrapped. This should be a very rare occurance and should occur only when the dataset is being repaired by rolling back multiple commits or restoring to an much older time. * Fixed checkstyle * Improvements from review comments. Fixed checkstyle Replaced explicit null check with Option.ofNullable Removed redundant function getSynedInstantTime * Renamed getSyncedInstantTime and getSyncedInstantTimeForReader. Sync is confusing so renamed to getUpdateTime() and getReaderTime(). * Removed getReaderTime which is only for testing as the same method can be accessed during testing differently without making it part of the public interface. * Fix compilation error * Reverting changes to HoodieMetadataFileSystemView Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-13 21:23:34 -07:00
Sivabalan Narayanan	642b1b671d	[HUDI-2151] Flipping defaults (#3452 )	2021-08-13 19:29:22 -04:00
Sagar Sumit	0544d70d8f	[MINOR] Deprecate older configs (#3464 ) Rename and deprecate props in HoodieWriteConfig Rename and deprecate older props	2021-08-12 20:31:04 -07:00
Prashant Wason	76bc686a77	[HUDI-1292] Created a config to enable/disable syncing of metadata table. (#3427 ) * [HUDI-1292] Created a config to enable/disable syncing of metadata table. - Metadata Table should only be synced from a single pipeline to prevent conflicts. - Skip syncing metadata table for clustering and compaction - Renamed useFileListingMetadata Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-12 15:45:57 -07:00
zhangyue19921010	9e8308527a	[HUDI-1518] Remove the logic that delete replaced file when archive (#3310 ) * remove delete replaced file when archive * done * remove unsed import * remove delete replaced files when archive realted UT * code reviewed Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-11 10:54:44 -07:00
Y Ethan Guo	4783176554	[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency (#3233 ) - Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.	2021-08-11 11:48:13 -04:00
swuferhong	5448cdde7e	[HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload (#3401 )	2021-08-11 10:20:55 +08:00
swuferhong	21db6d7a84	[HUDI-1771] Propagate CDC format for hoodie (#3285 )	2021-08-10 20:23:23 +08:00
zhangyue19921010	b4441abcf7	[HUDI-2194] Skip the latest N partitions when choosing partitions to create ClusteringPlan (#3300 ) * skip from latest partitions based on hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions && 0(default means skip nothing) * change config verison * add ut Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-09 10:10:15 -07:00
Sagar Sumit	70b6bd485f	[HUDI-1468] Support custom clustering strategies and preserve commit metadata as part of clustering (#3419 ) Co-authored-by: Satish Kotha <satishkotha@uber.com>	2021-08-06 22:53:08 -04:00
Danny Chan	02331fc223	[HUDI-2258] Metadata table for flink (#3381 )	2021-08-04 10:54:55 +08:00
wenningd	91bb0d1318	[HUDI-2255] Refactor Datasource options (#3373 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-08-03 17:50:30 -07:00
satishkotha	826a04d142	[HUDI-2072] Add pre-commit validator framework (#3153 ) * [HUDI-2072] Add pre-commit validator framework * trigger Travis rebuild	2021-08-03 12:07:45 -07:00
Danny Chan	bec23bda50	[HUDI-2269] Release the disk map resource for flink streaming reader (#3384 )	2021-08-03 13:55:35 +08:00

1 2 3 4 5

204 Commits