lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Gary Li	605b617cfa	[HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300 ) * [HUDI-1434] fix incorrect log file path in HoodieWriteStat * HoodieWriteHandle#close() returns a list of WriteStatus objs * Handle rolled-over log files and return a WriteStatus per log file written - Combined data and delete block logging into a single call - Lazily initialize and manage write status based on returned AppendResult - Use FSUtils.getFileSize() to set final file size, consistent with other handles - Added tests around returned values in AppendResult - Added validation of the file sizes returned in write stat Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-12-30 14:22:15 -08:00
wangxianghu	ef28763f08	[MINOR] Update report_coverage.sh (#2396 )	2020-12-30 19:47:04 +08:00
Prashant Wason	c6bf952332	[HUDI-1493] Fixed schema compatibility check for fields. (#2350 ) Some field types changes are allowed (e.g. int -> long) while maintaining schema backward compatibility within HUDI. The check was reversed with the reader schema being passed for the write schema.	2020-12-29 20:02:21 -05:00
Balajee Nagasubramaniam	e33a8f733c	[HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate vali… (#2045 ) * [HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate valid timestamps Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2020-12-29 16:33:19 -05:00
Balajee Nagasubramaniam	da51aa64fc	[HUDI-1474] Add additional unit tests to TestHBaseIndex (#2349 )	2020-12-28 23:04:38 -05:00
pengzhiwei	b83d1d3e61	[HUDI-1484] Escape the partition value in HiveSyncTool (#2363 )	2020-12-28 23:02:36 -05:00
steven zhang	4c17528de0	[HUDI-1398] Align insert file size for reducing IO (#2256 ) * [HUDI-1398] Align insert file size for reducing IO Co-authored-by: zhang wen <wen.zhang@dmall.com>	2020-12-28 22:52:35 -05:00
Danny Chan	0ecdec348e	[MINOR] Remove the duplicate code in AbstractHoodieWriteClient.startCommit (#2385 )	2020-12-29 10:49:24 +08:00
Danny Chan	76faf59652	[HUDI-1495] Upgrade Flink version to 1.12.0 (#2384 )	2020-12-29 10:15:43 +08:00
lw0090	e177466fd2	[HUDI-1350] Support Partition level delete API in HUDI (#2254 ) * [HUDI-1350] Support Partition level delete API in HUDI * [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction * [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction	2020-12-28 15:01:06 -08:00
lw0090	6cdf59d92b	[HUDI-1354] Block updates and replace on file groups in clustering (#2275 ) * [HUDI-1354] Block updates and replace on file groups in clustering * [HUDI-1354] Block updates and replace on file groups in clustering	2020-12-27 20:30:29 -08:00
lw0090	9e6889a8ce	[HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360 )	2020-12-27 20:27:09 -08:00
Sivabalan Narayanan	8cf6a7223f	[HUDI-1331] Adding support for validating entire dataset and long running tests in test suite framework (#2168 ) * trigger rebuild * [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927) * Adding support for validating records and long running tests in test sutie framework * Adding partial validate node * Fixing spark session initiation in Validate nodes * Fixing validation * Adding hive table validation to ValidateDatasetNode * Rebasing with latest commits from master * Addressing feedback * Addressing comments Co-authored-by: lamber-ken <lamberken@163.com> Co-authored-by: linshan-ma <mabin194046@163.com>	2020-12-26 09:29:24 -08:00
Balaji Varadarajan	3ec9270e8e	[HUDI-1490] Incremental Query should work even when there are partitions that have no incremental changes (#2371 ) * Incremental Query should work even when there are partitions that have no incremental changes Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2020-12-26 12:17:49 -05:00
lw0090	e807bb895e	[HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364 )	2020-12-25 09:54:23 -08:00
wenningd	286055ce34	[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 (#2328 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> - Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.	2020-12-25 09:43:34 -05:00
wenningd	89f482eaf2	[HUDI-1489] Fix null pointer exception when reading updated written bootstrap table (#2370 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-23 11:26:24 -08:00
pengzhiwei	38b9264dd0	[HUDI-1488] Fix Test Case Failure in TestHBaseIndex (#2365 )	2020-12-23 16:47:38 +08:00
wangxianghu	01ad449ad6	[HUDI-1485] Fix Deletes issued without any prior commits exception (#2361 )	2020-12-22 23:10:19 +08:00
wangxianghu	f8ccb2872d	[HUDI-1471] Make QuickStartUtils generate deletes according to specific ts (#2357 )	2020-12-22 21:14:18 +08:00
satishkotha	959afb8ba4	Merge pull request #2263 from satishkotha/sk/clustering [HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan	2020-12-21 19:18:18 -08:00
Satish Kotha	6dc03b65bf	[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan	2020-12-21 17:34:15 -08:00
jshmchenxi	0c821fecc2	[MINOR] Pass root exception to HoodieKeyGeneratorException for more information (#2354 ) Co-authored-by: Xi Chen <chenxi07@qiyi.com>	2020-12-22 09:02:23 +08:00
Shen Hong	e4e2fbc3bb	[HUDI-1419] Add base implementation for hudi java client (#2286 )	2020-12-19 19:25:27 -08:00
Sivabalan Narayanan	33d338f392	[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311 ) * Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges * Added default methods so existing payload classes are backwards compatible. * Adding DefaultHoodiePayload to honor ordering while merging two records * Fixing default payload based on feedback	2020-12-19 19:19:42 -08:00
Balajee Nagasubramaniam	5388c7f7a3	[HUDI-1470] Use the latest writer schema, when reading from existing parquet files in the hudi-test-suite (#2344 )	2020-12-18 19:18:52 +08:00
lw0090	8b5d6f9430	[HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322 )	2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran	14d5d1100c	[HUDI-1406] Add date partition based source input selector for Delta streamer (#2264 ) - Adds ability to list only recent date based partitions from source data. - Parallelizes listing for faster tailing of DFSSources	2020-12-17 03:59:30 -08:00
wangxianghu	4ddfc61d70	[MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#2340 )	2020-12-17 18:00:23 +08:00
ChangLi	6a6b772c49	[MINOR] Fix error information in exception (#2341 )	2020-12-16 19:37:01 +08:00
wenningd	26cdc457f6	[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-15 16:20:48 -08:00
Danny Chan	93d9c25aee	[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index (#2319 )	2020-12-14 22:35:24 -08:00
Balaji Varadarajan	069a1dcf24	[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets (#2301 )	2020-12-14 22:24:12 -08:00
lw0090	facde4c16f	[HUDI-1448] Hudi dla sync support skip rt table syncing (#2324 )	2020-12-14 23:25:10 +08:00
steven zhang	11bc1fe6f4	[HUDI-1428] Clean old fileslice is invalid (#2292 ) Co-authored-by: zhang wen <wen.zhang@dmall.com> Co-authored-by: zhang wen <steven@stevendeMac-mini.local>	2020-12-13 06:28:53 -08:00
Shen Hong	236d1b0dec	[HUDI-1439] Remove scala dependency from hudi-client-common (#2306 )	2020-12-11 00:36:37 -08:00
wangxianghu	6cf25d5c8a	[MINOR] Minor improve in IncrementalRelation (#2314 )	2020-12-10 20:16:00 +08:00
Danny Chan	4bc45a391a	[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313 )	2020-12-10 20:02:02 +08:00
Raymond Xu	bd9cceccb5	[HUDI-1395] Fix partition path using FSUtils (#2312 ) Fixed the logic to get partition path in Copier and Exporter utilities.	2020-12-10 10:19:19 +08:00
wangxianghu	007014c1ef	[MINOR] Throw an exception when keyGenerator initialization failed (#2307 )	2020-12-10 09:56:19 +08:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00
jshmchenxi	3a91d26d62	fix typo (#2308 ) Co-authored-by: Xi Chen <chenxi07@qiyi.com>	2020-12-08 06:28:20 -08:00
wangxianghu	de2fbeac33	[HUDI-1412] Make HoodieWriteConfig support setting different default … (#2278 ) * [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type	2020-12-07 09:29:53 +08:00
pengzhiwei	319b7a58e4	[HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true (#2295 )	2020-12-05 08:07:25 +08:00
liujinhui	62b392b49c	[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192 ) Co-authored-by: liujh <liujh@t3go.cn>	2020-12-03 19:28:34 -08:00
lw0090	1f0d5c077e	[HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196 )	2020-12-03 12:26:21 -08:00
rmpifer	78fd122594	[HUDI-1196] Update HoodieKey when deduplicating records with global index (#2248 ) - Works only for overwrite payload (default) - Does not alter current semantics otherwise Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2020-12-01 13:50:46 -08:00
Prashant Wason	ac23d2587f	[HUDI-1357] Added a check to validate records are not lost during merges. (#2216 ) - Turned off by default	2020-12-01 13:44:57 -08:00
Guy Khazma	b826c53e33	[HUDI-1373] Add Support for OpenJ9 JVM (#2231 ) * add supoort for OpenJ9 VM * add 32bit openJ9 * Pulled the memory layout specs into their own classes.	2020-12-01 13:19:40 -08:00
pengzhiwei	36ce5bcd92	[HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true (#2289 )	2020-11-30 23:07:21 +08:00

1 2 3 4 5 ...

1271 Commits