lw0090
e177466fd2
[HUDI-1350] Support Partition level delete API in HUDI ( #2254 )
...
* [HUDI-1350] Support Partition level delete API in HUDI
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
2020-12-28 15:01:06 -08:00
lw0090
6cdf59d92b
[HUDI-1354] Block updates and replace on file groups in clustering ( #2275 )
...
* [HUDI-1354] Block updates and replace on file groups in clustering
* [HUDI-1354] Block updates and replace on file groups in clustering
2020-12-27 20:30:29 -08:00
lw0090
9e6889a8ce
[HUDI-1481] add structured streaming and delta streamer clustering unit test ( #2360 )
2020-12-27 20:27:09 -08:00
Sivabalan Narayanan
8cf6a7223f
[HUDI-1331] Adding support for validating entire dataset and long running tests in test suite framework ( #2168 )
...
* trigger rebuild
* [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927 )
* Adding support for validating records and long running tests in test sutie framework
* Adding partial validate node
* Fixing spark session initiation in Validate nodes
* Fixing validation
* Adding hive table validation to ValidateDatasetNode
* Rebasing with latest commits from master
* Addressing feedback
* Addressing comments
Co-authored-by: lamber-ken <lamberken@163.com >
Co-authored-by: linshan-ma <mabin194046@163.com >
2020-12-26 09:29:24 -08:00
Balaji Varadarajan
3ec9270e8e
[HUDI-1490] Incremental Query should work even when there are partitions that have no incremental changes ( #2371 )
...
* Incremental Query should work even when there are partitions that have no incremental changes
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2020-12-26 12:17:49 -05:00
lw0090
e807bb895e
[HUDI-1487] fix unit test testCopyOnWriteStorage random failed ( #2364 )
2020-12-25 09:54:23 -08:00
wenningd
286055ce34
[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 ( #2328 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00
wenningd
89f482eaf2
[HUDI-1489] Fix null pointer exception when reading updated written bootstrap table ( #2370 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-23 11:26:24 -08:00
pengzhiwei
38b9264dd0
[HUDI-1488] Fix Test Case Failure in TestHBaseIndex ( #2365 )
2020-12-23 16:47:38 +08:00
wangxianghu
01ad449ad6
[HUDI-1485] Fix Deletes issued without any prior commits exception ( #2361 )
2020-12-22 23:10:19 +08:00
wangxianghu
f8ccb2872d
[HUDI-1471] Make QuickStartUtils generate deletes according to specific ts ( #2357 )
2020-12-22 21:14:18 +08:00
satishkotha
959afb8ba4
Merge pull request #2263 from satishkotha/sk/clustering
...
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf
[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan
2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2
[MINOR] Pass root exception to HoodieKeyGeneratorException for more information ( #2354 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-22 09:02:23 +08:00
Shen Hong
e4e2fbc3bb
[HUDI-1419] Add base implementation for hudi java client ( #2286 )
2020-12-19 19:25:27 -08:00
Sivabalan Narayanan
33d338f392
[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue ( #2311 )
...
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible.
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
Balajee Nagasubramaniam
5388c7f7a3
[HUDI-1470] Use the latest writer schema, when reading from existing parquet files in the hudi-test-suite ( #2344 )
2020-12-18 19:18:52 +08:00
lw0090
8b5d6f9430
[HUDI-1437] support more accurate spark JobGroup for better performance tracking ( #2322 )
2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran
14d5d1100c
[HUDI-1406] Add date partition based source input selector for Delta streamer ( #2264 )
...
- Adds ability to list only recent date based partitions from source data.
- Parallelizes listing for faster tailing of DFSSources
2020-12-17 03:59:30 -08:00
wangxianghu
4ddfc61d70
[MINOR] Make QuickstartUtil generate random timestamp instead of 0 ( #2340 )
2020-12-17 18:00:23 +08:00
ChangLi
6a6b772c49
[MINOR] Fix error information in exception ( #2341 )
2020-12-16 19:37:01 +08:00
wenningd
26cdc457f6
[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing ( #2233 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-15 16:20:48 -08:00
Danny Chan
93d9c25aee
[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index ( #2319 )
2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24
[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets ( #2301 )
2020-12-14 22:24:12 -08:00
lw0090
facde4c16f
[HUDI-1448] Hudi dla sync support skip rt table syncing ( #2324 )
2020-12-14 23:25:10 +08:00
steven zhang
11bc1fe6f4
[HUDI-1428] Clean old fileslice is invalid ( #2292 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
Co-authored-by: zhang wen <steven@stevendeMac-mini.local >
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec
[HUDI-1439] Remove scala dependency from hudi-client-common ( #2306 )
2020-12-11 00:36:37 -08:00
wangxianghu
6cf25d5c8a
[MINOR] Minor improve in IncrementalRelation ( #2314 )
2020-12-10 20:16:00 +08:00
Danny Chan
4bc45a391a
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder ( #2313 )
2020-12-10 20:02:02 +08:00
Raymond Xu
bd9cceccb5
[HUDI-1395] Fix partition path using FSUtils ( #2312 )
...
Fixed the logic to get partition path in Copier and Exporter utilities.
2020-12-10 10:19:19 +08:00
wangxianghu
007014c1ef
[MINOR] Throw an exception when keyGenerator initialization failed ( #2307 )
2020-12-10 09:56:19 +08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00
jshmchenxi
3a91d26d62
fix typo ( #2308 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-08 06:28:20 -08:00
wangxianghu
de2fbeac33
[HUDI-1412] Make HoodieWriteConfig support setting different default … ( #2278 )
...
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
pengzhiwei
319b7a58e4
[HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true ( #2295 )
2020-12-05 08:07:25 +08:00
liujinhui
62b392b49c
[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion ( #2192 )
...
Co-authored-by: liujh <liujh@t3go.cn >
2020-12-03 19:28:34 -08:00
lw0090
1f0d5c077e
[HUDI-1349] spark sql support overwrite use insert_overwrite_table ( #2196 )
2020-12-03 12:26:21 -08:00
rmpifer
78fd122594
[HUDI-1196] Update HoodieKey when deduplicating records with global index ( #2248 )
...
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f
[HUDI-1357] Added a check to validate records are not lost during merges. ( #2216 )
...
- Turned off by default
2020-12-01 13:44:57 -08:00
Guy Khazma
b826c53e33
[HUDI-1373] Add Support for OpenJ9 JVM ( #2231 )
...
* add supoort for OpenJ9 VM
* add 32bit openJ9
* Pulled the memory layout specs into their own classes.
2020-12-01 13:19:40 -08:00
pengzhiwei
36ce5bcd92
[HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true ( #2289 )
2020-11-30 23:07:21 +08:00
leesf
3d5e9fee7f
[MINOR] refactor code in HoodieMergeHandle ( #2272 )
2020-11-28 21:47:05 +08:00
steven zhang
56866a11fe
[HUDI-1392] lose partition info when using spark parameter basePath ( #2243 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
2020-11-25 11:55:33 +08:00
Balaji Varadarajan
0ebef1c0a0
[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable ( #2249 )
2020-11-23 10:56:26 -08:00
wenningd
751e4ee882
[HUDI-1396] Fix for preventing bootstrap datasource jobs from hanging via spark-submit ( #2253 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-11-23 10:43:24 -08:00
Shen Hong
d9411c38db
[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client ( #2222 )
2020-11-23 10:06:28 -08:00
hongdd
971f028aaf
[HUDI-1393] Add compaction action in archive command ( #2246 )
2020-11-23 16:53:01 +08:00
wangxianghu
537502a8ef
[MINOR] Add apacheflink label ( #2268 )
2020-11-22 10:41:11 +08:00
Gary Li
c8d5ea2752
[MINOR] clean up and add comments to flink client ( #2261 )
2020-11-19 15:27:52 +08:00
pengzhiwei
d7af8caa45
[HUDI-1384] Decoupling hive jdbc dependency when HIVE_USE_JDBC_OPT_KEY set false ( #2241 )
2020-11-19 13:44:03 +08:00