1
0
Commit Graph

1271 Commits

Author SHA1 Message Date
Gary Li
605b617cfa [HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300)
* [HUDI-1434] fix incorrect log file path in HoodieWriteStat

* HoodieWriteHandle#close() returns a list of WriteStatus objs

* Handle rolled-over log files and return a WriteStatus per log file written

 - Combined data and delete block logging into a single call
 - Lazily initialize and manage write status based on returned AppendResult
 - Use FSUtils.getFileSize() to set final file size, consistent with other handles
 - Added tests around returned values in AppendResult
 - Added validation of the file sizes returned in write stat

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-12-30 14:22:15 -08:00
wangxianghu
ef28763f08 [MINOR] Update report_coverage.sh (#2396) 2020-12-30 19:47:04 +08:00
Prashant Wason
c6bf952332 [HUDI-1493] Fixed schema compatibility check for fields. (#2350)
Some field types changes are allowed (e.g. int -> long) while maintaining schema backward compatibility within HUDI. The check was reversed with the reader schema being passed for the write schema.
2020-12-29 20:02:21 -05:00
Balajee Nagasubramaniam
e33a8f733c [HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate vali… (#2045)
* [HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate valid timestamps

Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2020-12-29 16:33:19 -05:00
Balajee Nagasubramaniam
da51aa64fc [HUDI-1474] Add additional unit tests to TestHBaseIndex (#2349) 2020-12-28 23:04:38 -05:00
pengzhiwei
b83d1d3e61 [HUDI-1484] Escape the partition value in HiveSyncTool (#2363) 2020-12-28 23:02:36 -05:00
steven zhang
4c17528de0 [HUDI-1398] Align insert file size for reducing IO (#2256)
* [HUDI-1398] Align insert file size for reducing IO

Co-authored-by: zhang wen <wen.zhang@dmall.com>
2020-12-28 22:52:35 -05:00
Danny Chan
0ecdec348e [MINOR] Remove the duplicate code in AbstractHoodieWriteClient.startCommit (#2385) 2020-12-29 10:49:24 +08:00
Danny Chan
76faf59652 [HUDI-1495] Upgrade Flink version to 1.12.0 (#2384) 2020-12-29 10:15:43 +08:00
lw0090
e177466fd2 [HUDI-1350] Support Partition level delete API in HUDI (#2254)
* [HUDI-1350] Support Partition level delete API in HUDI

* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction

* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
2020-12-28 15:01:06 -08:00
lw0090
6cdf59d92b [HUDI-1354] Block updates and replace on file groups in clustering (#2275)
* [HUDI-1354] Block updates and replace on file groups in clustering

* [HUDI-1354]  Block updates and replace on file groups in clustering
2020-12-27 20:30:29 -08:00
lw0090
9e6889a8ce [HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360) 2020-12-27 20:27:09 -08:00
Sivabalan Narayanan
8cf6a7223f [HUDI-1331] Adding support for validating entire dataset and long running tests in test suite framework (#2168)
* trigger rebuild

* [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927)

* Adding support for validating records and long running tests in test sutie framework

* Adding partial validate node

* Fixing spark session initiation in Validate nodes

* Fixing validation

* Adding hive table validation to ValidateDatasetNode

* Rebasing with latest commits from master

* Addressing feedback

* Addressing comments

Co-authored-by: lamber-ken <lamberken@163.com>
Co-authored-by: linshan-ma <mabin194046@163.com>
2020-12-26 09:29:24 -08:00
Balaji Varadarajan
3ec9270e8e [HUDI-1490] Incremental Query should work even when there are partitions that have no incremental changes (#2371)
* Incremental Query should work even when there are  partitions that have no incremental changes

Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2020-12-26 12:17:49 -05:00
lw0090
e807bb895e [HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364) 2020-12-25 09:54:23 -08:00
wenningd
286055ce34 [HUDI-1451] Support bulk insert v2 with Spark 3.0.0 (#2328)
Co-authored-by: Wenning Ding <wenningd@amazon.com>

- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00
wenningd
89f482eaf2 [HUDI-1489] Fix null pointer exception when reading updated written bootstrap table (#2370)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-23 11:26:24 -08:00
pengzhiwei
38b9264dd0 [HUDI-1488] Fix Test Case Failure in TestHBaseIndex (#2365) 2020-12-23 16:47:38 +08:00
wangxianghu
01ad449ad6 [HUDI-1485] Fix Deletes issued without any prior commits exception (#2361) 2020-12-22 23:10:19 +08:00
wangxianghu
f8ccb2872d [HUDI-1471] Make QuickStartUtils generate deletes according to specific ts (#2357) 2020-12-22 21:14:18 +08:00
satishkotha
959afb8ba4 Merge pull request #2263 from satishkotha/sk/clustering
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf [HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan 2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2 [MINOR] Pass root exception to HoodieKeyGeneratorException for more information (#2354)
Co-authored-by: Xi Chen <chenxi07@qiyi.com>
2020-12-22 09:02:23 +08:00
Shen Hong
e4e2fbc3bb [HUDI-1419] Add base implementation for hudi java client (#2286) 2020-12-19 19:25:27 -08:00
Sivabalan Narayanan
33d338f392 [HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311)
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible. 
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
Balajee Nagasubramaniam
5388c7f7a3 [HUDI-1470] Use the latest writer schema, when reading from existing parquet files in the hudi-test-suite (#2344) 2020-12-18 19:18:52 +08:00
lw0090
8b5d6f9430 [HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322) 2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran
14d5d1100c [HUDI-1406] Add date partition based source input selector for Delta streamer (#2264)
- Adds ability to list only recent date based partitions from source data.
- Parallelizes listing for faster tailing of DFSSources
2020-12-17 03:59:30 -08:00
wangxianghu
4ddfc61d70 [MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#2340) 2020-12-17 18:00:23 +08:00
ChangLi
6a6b772c49 [MINOR] Fix error information in exception (#2341) 2020-12-16 19:37:01 +08:00
wenningd
26cdc457f6 [HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-15 16:20:48 -08:00
Danny Chan
93d9c25aee [MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index (#2319) 2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24 [HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets (#2301) 2020-12-14 22:24:12 -08:00
lw0090
facde4c16f [HUDI-1448] Hudi dla sync support skip rt table syncing (#2324) 2020-12-14 23:25:10 +08:00
steven zhang
11bc1fe6f4 [HUDI-1428] Clean old fileslice is invalid (#2292)
Co-authored-by: zhang wen <wen.zhang@dmall.com>
Co-authored-by: zhang wen <steven@stevendeMac-mini.local>
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec [HUDI-1439] Remove scala dependency from hudi-client-common (#2306) 2020-12-11 00:36:37 -08:00
wangxianghu
6cf25d5c8a [MINOR] Minor improve in IncrementalRelation (#2314) 2020-12-10 20:16:00 +08:00
Danny Chan
4bc45a391a [HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313) 2020-12-10 20:02:02 +08:00
Raymond Xu
bd9cceccb5 [HUDI-1395] Fix partition path using FSUtils (#2312)
Fixed the logic to get partition path in Copier and Exporter utilities.
2020-12-10 10:19:19 +08:00
wangxianghu
007014c1ef [MINOR] Throw an exception when keyGenerator initialization failed (#2307) 2020-12-10 09:56:19 +08:00
wenningd
fce1453fa6 [HUDI-1040] Make Hudi support Spark 3 (#2208)
* Fix flaky MOR unit test

* Update Spark APIs to make it be compatible with both spark2 & spark3

* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3

* Add spark3 profile to handle fasterxml & spark version

* Create hudi-spark-common module & refactor hudi-spark related modules

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-09 15:52:23 -08:00
jshmchenxi
3a91d26d62 fix typo (#2308)
Co-authored-by: Xi Chen <chenxi07@qiyi.com>
2020-12-08 06:28:20 -08:00
wangxianghu
de2fbeac33 [HUDI-1412] Make HoodieWriteConfig support setting different default … (#2278)
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
pengzhiwei
319b7a58e4 [HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true (#2295) 2020-12-05 08:07:25 +08:00
liujinhui
62b392b49c [HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192)
Co-authored-by: liujh <liujh@t3go.cn>
2020-12-03 19:28:34 -08:00
lw0090
1f0d5c077e [HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196) 2020-12-03 12:26:21 -08:00
rmpifer
78fd122594 [HUDI-1196] Update HoodieKey when deduplicating records with global index (#2248)
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise 

Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f [HUDI-1357] Added a check to validate records are not lost during merges. (#2216)
- Turned off by default
2020-12-01 13:44:57 -08:00
Guy Khazma
b826c53e33 [HUDI-1373] Add Support for OpenJ9 JVM (#2231)
* add supoort for OpenJ9 VM
* add 32bit openJ9
* Pulled the memory layout specs into their own classes.
2020-12-01 13:19:40 -08:00
pengzhiwei
36ce5bcd92 [HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true (#2289) 2020-11-30 23:07:21 +08:00