lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
davehagman	dfe3b84715	[HUDI-2579] Make deltastreamer checkpoint state merging more explicit (#3820 ) Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2021-11-09 17:37:59 -05:00
Prashant Wason	b7ee341e14	[HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe. (#2819 )	2021-11-05 09:31:42 -04:00
Sagar Sumit	5b1992a92d	[HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved (#3802 )	2021-10-29 13:09:09 -04:00
Raymond Xu	d8560377c3	[HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter (#3849 ) Remove the logic of using deltastreamer to prep test table. Use fixture (compressed test table) instead.	2021-10-24 21:14:39 -07:00
Raymond Xu	f5d7362ee8	[HUDI-2077] Fix flakiness in TestHoodieDeltaStreamer (#3829 )	2021-10-20 23:57:12 -04:00
zhangyue19921010	e6711b171a	[HUDI-2435][BUG]Fix clustering handle errors (#3666 ) * done * remove unused imports * code reviewed * code reviewed Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-10-12 15:24:48 -07:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
zhangyue19921010	dd1bd62684	[HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource (#3413 ) * add ORCDFSSource to support reading orc file into hudi format && add UTs * remove ununsed import * simplify tes * code review * code review * code review * code review * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-09-29 08:54:12 -07:00
qianchutao	9067657a5f	[HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka (#3715 )	2021-09-28 13:47:15 +08:00
董可伦	36be287121	[MINOR] Fix typo,'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet' (#3717 )	2021-09-26 21:53:39 +08:00
qianchutao	7e887b54d7	[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721 )	2021-09-26 21:52:35 +08:00
zhangyue19921010	2d5ac55195	[HUDI-2355][Bug]Archive service executed after cleaner finished. (#3545 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-09-15 19:00:04 -04:00
liujinhui	35a04c43a5	[HUDI-2425] TestHoodieMultiTableDeltaStreamer CI failed due to exception (#3654 )	2021-09-13 06:57:04 -07:00
K.I. (Dennis) Jung	c79017cb74	[HUDI-2397] Add `--enable-sync` parameter (#3608 ) * add meta-sync config * update test * keep enableMetaSync same with enableHiveSync * Switch check logic to use `enableMetaSync`	2021-09-13 12:04:49 +05:30
rmahindra123	e528dd798a	[HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592 ) - Fixing packaging, naming of classes - Use of log4j over slf4j for uniformity - More follow-on fixes - Added a version to control/coordinator events. - Eliminated the config added to write config - Fixed fetching of checkpoints based on table type - Clean up of naming, code placement Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-09-10 18:20:26 -07:00
Raymond Xu	57c8113ee1	[HUDI-2408] Deprecate FunctionalTestHarness to avoid init DFS (#3628 )	2021-09-09 11:29:04 -04:00
liujinhui	eb5e7eec0a	MINOR_CHECKSTYLE (#3616 ) Fix checkstyle	2021-09-07 18:19:39 +08:00
Raymond Xu	073c318d9f	[HUDI-1989] Disable HDFSParquetImporter related tests (#3597 ) Also mark HDFSParquetImportCommand and HDFSParquetImporter as deprecated.	2021-09-03 23:08:11 -04:00
董可伦	bf5a52e51b	[HUDI-2320] Add support ByteArrayDeserializer in AvroKafkaSource (#3502 )	2021-08-30 10:01:15 +08:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Y Ethan Guo	23dca6c237	[HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470 ) - Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.	2021-08-14 20:20:23 -04:00
Sagar Sumit	5cc96e85c1	[HUDI-1897] Deltastreamer source for AWS S3 (#3433 ) - Added two sources for two stage pipeline. a. S3EventsSource that fetches events from SQS and ingests to a meta hoodie table. b. S3EventsHoodieIncrSource reads S3 events from this meta hoodie table, fetches actual objects from S3 and ingests to sink hoodie table. - Added selectors to assist in S3EventsSource. Co-authored-by: Satish M <84978833+satishmittal1111@users.noreply.github.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-14 08:25:10 -04:00
Sagar Sumit	0544d70d8f	[MINOR] Deprecate older configs (#3464 ) Rename and deprecate props in HoodieWriteConfig Rename and deprecate older props	2021-08-12 20:31:04 -07:00
Sivabalan Narayanan	b651336454	[HUDI-2294] Adding virtual keys support to deltastreamer (#3450 )	2021-08-12 08:02:39 -04:00
liujinhui	c0fc9cdaf3	MINOR (#3459 ) MOVE hoodie Deltrstreamer to hudi-utilties	2021-08-12 18:19:05 +08:00
vinoyang	dc3cbb28e7	[MINOR] Correct TestKafkaSource class and comment (#3451 )	2021-08-12 09:11:00 +08:00
Y Ethan Guo	4783176554	[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency (#3233 ) - Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.	2021-08-11 11:48:13 -04:00
Sivabalan Narayanan	1196736185	[HUDI-1129] Improving schema evolution support in hudi (#2927 ) * Adding support to ingest records with old schema after table's schema is evolved * Rebasing against latest master - Trimming test file to be < 800 lines - Renaming config names * Addressing feedback Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-10 09:15:37 -07:00
wenningd	91bb0d1318	[HUDI-2255] Refactor Datasource options (#3373 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-08-03 17:50:30 -07:00
rmahindra123	245e1fd17d	[HUDI-2272] Pass base file format to sync clients (#3397 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-08-03 14:46:02 -07:00
zhangyue19921010	dde57b293c	[HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering (#3259 ) * add --mode schedule/execute/scheduleandexecute * fix checkstyle * add UT testHoodieAsyncClusteringJobWithScheduleAndExecute * log changed * try to make ut success * try to fix ut * modify ut * review changed * code review * code review * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-02 08:07:59 +08:00
Sivabalan Narayanan	7bdae69053	[HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371 ) Co-authored-by: Sivabalan Narayanan <nsb@Sivabalans-MBP.attlocal.net>	2021-07-29 22:22:26 -07:00
davehagman	8105cf588e	[HUDI-2230] Make codahale times transient to avoid serializable exceptions (#3345 )	2021-07-28 14:45:09 +08:00
Sivabalan Narayanan	61148c1c43	[HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306 )	2021-07-26 17:21:04 -04:00
rmahindra123	a14b19fdd5	[HUDI-1241] Automate the generation of configs webpage as configs are added to Hudi repo (#3302 )	2021-07-23 21:33:34 -07:00
jsbali	66207ed91a	[HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879 ) * [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool * [HUDI-1848] Fixing test cases * [HUDI-1848] CR changes * [HUDI-1848] Fix checkstyle violations * [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels. * [HUDI-1848] Adding the complex schema and resolving merge conflicts * [HUDI-1848] Adding some more javadocs * [HUDI-1848] Added javadocs for DDLExecutor impls * [HUDI-1848] Fixed style issue	2021-07-23 09:03:15 -07:00
pengzhiwei	2c910ee3af	[HUDI-2212] Missing PrimaryKey In Hoodie Properties For CTAS Table (#3332 )	2021-07-23 15:21:57 +08:00
Samrat	a086d255c8	[HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer (#3184 )	2021-07-19 21:49:43 -04:00
liujinhui	af837d2f18	[HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp (#2438 )	2021-07-17 00:31:06 -04:00
liujinhui	3b264e80d9	[HUDI-1633] Make callback return HoodieWriteStat (#2445 ) * CALLBACK add partitionPath * callback can send hoodieWriteStat * add ApiMaturityLevel	2021-07-16 12:37:07 +08:00
vinoth chandar	75040ee9e5	[HUDI-2149] Ensure and Audit docs for every configuration class in the codebase (#3272 ) - Added docs when missing - Rewrote, reworded as needed - Made couple more classes extend HoodieConfig	2021-07-14 10:56:08 -07:00
Sagar Sumit	5804ad8e32	[HUDI-1483] Support async clustering for deltastreamer and Spark streaming (#3142 ) - Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink - Added methods in HoodieAsyncService to reuse code	2021-07-11 14:43:38 -04:00
Sebastian Bernauer	8f7ad8b178	[HUDI-2069] Refactored String constants (#3172 )	2021-07-07 14:22:00 -04:00
Randal Boyle	60e0254e67	[HUDI-1996] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry (#3097 ) * adding support for basic auth with confluent cloud schema registry	2021-07-05 23:40:23 -07:00
Sebastian Bernauer	05d6e18190	[HUDI-2055] Added deltastreamer metric for time of lastSync (#3129 )	2021-07-05 23:34:46 -07:00
pengzhiwei	b34d53fa9c	[HUDI-2088] Missing Partition Fields And PreCombineField In Hoodie Properties For Table Written By Flink (#3171 )	2021-07-01 17:25:18 +08:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
Vinay Patil	94f0f40fec	[HUDI-1944] Support Hudi to read from committed offset (#3175 ) * [HUDI-1944] Support Hudi to read from committed offset * [HUDI-1944] Adding group option to KafkaResetOffsetStrategies * [HUDI-1944] Update Exception msg	2021-06-30 16:41:28 +08:00
Vinay Patil	039aeb6dce	[HUDI-1910] Commit Offset to Kafka after successful Hudi commit (#3092 )	2021-06-28 21:52:05 +08:00

1 2 3 4 5 ...

307 Commits