lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Prashant Wason	76bc686a77	[HUDI-1292] Created a config to enable/disable syncing of metadata table. (#3427 ) * [HUDI-1292] Created a config to enable/disable syncing of metadata table. - Metadata Table should only be synced from a single pipeline to prevent conflicts. - Skip syncing metadata table for clustering and compaction - Renamed useFileListingMetadata Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-12 15:45:57 -07:00
Sivabalan Narayanan	b651336454	[HUDI-2294] Adding virtual keys support to deltastreamer (#3450 )	2021-08-12 08:02:39 -04:00
liujinhui	c0fc9cdaf3	MINOR (#3459 ) MOVE hoodie Deltrstreamer to hudi-utilties	2021-08-12 18:19:05 +08:00
vinoyang	dc3cbb28e7	[MINOR] Correct TestKafkaSource class and comment (#3451 )	2021-08-12 09:11:00 +08:00
Prashant Wason	b3e430f24b	[HUDI-2017] Add API to set a metric in the registry. (#3084 ) Registry.add() API adds the new value to existing metric value. For some use-cases We need a API to set/replace the existing value. Metadata Table is synced in preWrite() and postWrite() functions of commit. As part of the sync, the current sizes and basefile/logfile counts are published as metrics. If we use the Registry.add() API, the count and sizes are incorrectly published as sum of the two values. This is corrected by using the Registry.set() API instead.	2021-08-11 16:47:16 -07:00
zhangyue19921010	9e8308527a	[HUDI-1518] Remove the logic that delete replaced file when archive (#3310 ) * remove delete replaced file when archive * done * remove unsed import * remove delete replaced files when archive realted UT * code reviewed Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-11 10:54:44 -07:00
Y Ethan Guo	4783176554	[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency (#3233 ) - Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.	2021-08-11 11:48:13 -04:00
Danny Chan	29332498af	[HUDI-2298] The HoodieMergedLogRecordScanner should set up the operation of the chosen record (#3456 )	2021-08-11 22:55:43 +08:00
Prashant Wason	aa11989ead	[HUDI-2286] Handle the case of failed deltacommit on the metadata table. (#3428 ) A failed deltacommit on the metadata table will be automatically rolled back. Assuming the failed commit was "t10", the rollback will happen the next time at "t11". Post rollback, when we try to sync the dataset to the metadata table, we should look for all unsynched instants including t11. Current code ignores t11 since the latest commit timestamp on metadata table is t11 (due to rollback).	2021-08-11 07:39:48 -07:00
Sivabalan Narayanan	c9fa3cffaf	[HUDI-1774] Adding support for delete_partitions to spark data source (#3437 )	2021-08-11 01:03:01 -04:00
Shawy Geng	a5e496fe23	[HUDI-2292] MOR should not predicate pushdown when reading with payload_combine type (#3443 )	2021-08-11 12:17:39 +08:00
Raymond Xu	8255a86cb4	[HUDI-1939] remove joda time in hivesync module (#3430 )	2021-08-10 20:25:41 -07:00
swuferhong	5448cdde7e	[HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload (#3401 )	2021-08-11 10:20:55 +08:00
Shawy Geng	d1b4aa59bf	[HUDI-2042] Compare the field object directly in OverwriteWithLatestAvroPayload (#3108 )	2021-08-10 17:48:53 -04:00
Damon P. Cortesi	abbc8328e6	[MINOR] Fix contribution link in PULL_REQUEST_TEMPLATE (#3425 )	2021-08-10 13:01:45 -07:00
vinoyang	0e1c592c69	[MINOR] Delete useless com.uber.hoodie.hadoop.hive.HoodieCombineHiveInputFormat (#3298 )	2021-08-10 12:05:31 -07:00
Sivabalan Narayanan	1196736185	[HUDI-1129] Improving schema evolution support in hudi (#2927 ) * Adding support to ingest records with old schema after table's schema is evolved * Rebasing against latest master - Trimming test file to be < 800 lines - Renaming config names * Addressing feedback Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-10 09:15:37 -07:00
zhangyue19921010	73d898322b	[MINOR] Fix travis from errors (#3432 )	2021-08-10 08:25:49 -07:00
xuzifu666	a18bc839d1	[HUDI-2288] Support storage on ks3 for hudi (#3434 ) Co-authored-by: xuzifu <xuzifu.com>	2021-08-10 23:18:12 +08:00
swuferhong	21db6d7a84	[HUDI-1771] Propagate CDC format for hoodie (#3285 )	2021-08-10 20:23:23 +08:00
zhangyue19921010	b4441abcf7	[HUDI-2194] Skip the latest N partitions when choosing partitions to create ClusteringPlan (#3300 ) * skip from latest partitions based on hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions && 0(default means skip nothing) * change config verison * add ut Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-09 10:10:15 -07:00
pengzhiwei	41a9986a76	[HUDI-2208] Support Bulk Insert For Spark Sql (#3328 )	2021-08-09 00:18:31 -04:00
yuzhaojing	11ea74958d	[HUDI-2247] Filter file where length less than parquet MAGIC length (#3363 ) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>	2021-08-09 09:15:42 +08:00
pengzhiwei	32a50d8ddb	[HUDI-2243] Support Time Travel Query For Hoodie Table (#3360 )	2021-08-07 19:07:22 -04:00
pengzhiwei	55d2e786db	[HUDI-1842] Spark Sql Support For pre-existing Hoodie Table (#3393 )	2021-08-07 07:49:26 -04:00
Sagar Sumit	70b6bd485f	[HUDI-1468] Support custom clustering strategies and preserve commit metadata as part of clustering (#3419 ) Co-authored-by: Satish Kotha <satishkotha@uber.com>	2021-08-06 22:53:08 -04:00
pengzhiwei	9ce548edb1	[MINOR] fix compile error in compaction command (#3421 )	2021-08-06 16:18:19 +08:00
pengzhiwei	3f8ca1a355	[HUDI-2182] Support Compaction Command For Spark Sql (#3277 )	2021-08-06 15:12:10 +08:00
Danny Chan	20feb1a897	[HUDI-2278] Use INT64 timestamp with precision 3 for flink parquet writer (#3414 )	2021-08-06 11:06:21 +08:00
Danny Chan	b7586a5632	[HUDI-2274] Allows INSERT duplicates for Flink MOR table (#3403 )	2021-08-06 10:30:52 +08:00
pengzhiwei	0dcd6a8fca	[HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387 )	2021-08-05 09:57:22 -04:00
Sivabalan Narayanan	1df5ded433	[HUDI-2273] Migrating some long running tests to functional test profile (#3398 )	2021-08-04 19:08:50 -04:00
pengzhiwei	5574e092fb	[HUDI-2232] [SQL] MERGE INTO fails with table having nested struct (#3379 )	2021-08-04 18:20:29 +08:00
yuzhaojing	b8b9d6db83	[HUDI-2087] Support Append only in Flink stream (#3390 ) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>	2021-08-04 17:53:20 +08:00
Danny Chan	02331fc223	[HUDI-2258] Metadata table for flink (#3381 )	2021-08-04 10:54:55 +08:00
rmahindra123	b4c14eaa29	[HUDI-2090] Ensure Disk Maps create a subfolder with appropriate prefixes and cleans them up on close (#3329 ) * Add UUID to the folder name for External Spillable File System * Fix to ensure that Disk maps folders do not interefere across users * Fix test * Fix test * Rebase with latest mater and address comments * Add Shutdown Hooks for the Disk Map Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-08-03 17:51:25 -07:00
wenningd	91bb0d1318	[HUDI-2255] Refactor Datasource options (#3373 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-08-03 17:50:30 -07:00
Udit Mehrotra	1ff2d3459a	[HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL (#2893 )	2021-08-03 14:47:40 -07:00
rmahindra123	245e1fd17d	[HUDI-2272] Pass base file format to sync clients (#3397 ) Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-08-03 14:46:02 -07:00
satishkotha	826a04d142	[HUDI-2072] Add pre-commit validator framework (#3153 ) * [HUDI-2072] Add pre-commit validator framework * trigger Travis rebuild	2021-08-03 12:07:45 -07:00
Danny Chan	bec23bda50	[HUDI-2269] Release the disk map resource for flink streaming reader (#3384 )	2021-08-03 13:55:35 +08:00
Sagar Sumit	aa857beee0	[HUDI-2225] Add a compaction job in hudi-examples (#3347 )	2021-08-03 11:31:56 +08:00
vinoth chandar	b21ae68e67	[MINOR] Improving runtime of TestStructuredStreaming by 2 mins (#3382 )	2021-08-02 13:42:46 -07:00
Sivabalan Narayanan	fe508376fa	[HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315 )	2021-08-02 09:45:09 -04:00
zhangyue19921010	dde57b293c	[HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering (#3259 ) * add --mode schedule/execute/scheduleandexecute * fix checkstyle * add UT testHoodieAsyncClusteringJobWithScheduleAndExecute * log changed * try to make ut success * try to fix ut * modify ut * review changed * code review * code review * code review * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-08-02 08:07:59 +08:00
Gary Li	6353fc865f	[HUDI-2218] Fix missing HoodieWriteStat in HoodieCreateHandle (#3341 )	2021-07-30 02:36:57 -07:00
swuferhong	f7f5d4cc6d	[HUDI-2184] Support setting hive sync partition extractor class based on flink configuration (#3284 )	2021-07-30 17:24:00 +08:00
Danny Chan	c4e45a0010	[HUDI-2254] Builtin sort operator for flink bulk insert (#3372 )	2021-07-30 16:58:11 +08:00
swuferhong	8b19ec9ca0	[HUDI-2252] Default consumes from the latest instant for flink streaming reader (#3368 )	2021-07-30 14:25:05 +08:00
Sivabalan Narayanan	7bdae69053	[HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371 ) Co-authored-by: Sivabalan Narayanan <nsb@Sivabalans-MBP.attlocal.net>	2021-07-29 22:22:26 -07:00

1 2 3 4 5 ...

1823 Commits