1
0
Commit Graph

1253 Commits

Author SHA1 Message Date
wangxianghu
01ad449ad6 [HUDI-1485] Fix Deletes issued without any prior commits exception (#2361) 2020-12-22 23:10:19 +08:00
wangxianghu
f8ccb2872d [HUDI-1471] Make QuickStartUtils generate deletes according to specific ts (#2357) 2020-12-22 21:14:18 +08:00
satishkotha
959afb8ba4 Merge pull request #2263 from satishkotha/sk/clustering
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf [HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan 2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2 [MINOR] Pass root exception to HoodieKeyGeneratorException for more information (#2354)
Co-authored-by: Xi Chen <chenxi07@qiyi.com>
2020-12-22 09:02:23 +08:00
Shen Hong
e4e2fbc3bb [HUDI-1419] Add base implementation for hudi java client (#2286) 2020-12-19 19:25:27 -08:00
Sivabalan Narayanan
33d338f392 [HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311)
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible. 
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
Balajee Nagasubramaniam
5388c7f7a3 [HUDI-1470] Use the latest writer schema, when reading from existing parquet files in the hudi-test-suite (#2344) 2020-12-18 19:18:52 +08:00
lw0090
8b5d6f9430 [HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322) 2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran
14d5d1100c [HUDI-1406] Add date partition based source input selector for Delta streamer (#2264)
- Adds ability to list only recent date based partitions from source data.
- Parallelizes listing for faster tailing of DFSSources
2020-12-17 03:59:30 -08:00
wangxianghu
4ddfc61d70 [MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#2340) 2020-12-17 18:00:23 +08:00
ChangLi
6a6b772c49 [MINOR] Fix error information in exception (#2341) 2020-12-16 19:37:01 +08:00
wenningd
26cdc457f6 [HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-15 16:20:48 -08:00
Danny Chan
93d9c25aee [MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index (#2319) 2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24 [HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets (#2301) 2020-12-14 22:24:12 -08:00
lw0090
facde4c16f [HUDI-1448] Hudi dla sync support skip rt table syncing (#2324) 2020-12-14 23:25:10 +08:00
steven zhang
11bc1fe6f4 [HUDI-1428] Clean old fileslice is invalid (#2292)
Co-authored-by: zhang wen <wen.zhang@dmall.com>
Co-authored-by: zhang wen <steven@stevendeMac-mini.local>
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec [HUDI-1439] Remove scala dependency from hudi-client-common (#2306) 2020-12-11 00:36:37 -08:00
wangxianghu
6cf25d5c8a [MINOR] Minor improve in IncrementalRelation (#2314) 2020-12-10 20:16:00 +08:00
Danny Chan
4bc45a391a [HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313) 2020-12-10 20:02:02 +08:00
Raymond Xu
bd9cceccb5 [HUDI-1395] Fix partition path using FSUtils (#2312)
Fixed the logic to get partition path in Copier and Exporter utilities.
2020-12-10 10:19:19 +08:00
wangxianghu
007014c1ef [MINOR] Throw an exception when keyGenerator initialization failed (#2307) 2020-12-10 09:56:19 +08:00
wenningd
fce1453fa6 [HUDI-1040] Make Hudi support Spark 3 (#2208)
* Fix flaky MOR unit test

* Update Spark APIs to make it be compatible with both spark2 & spark3

* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3

* Add spark3 profile to handle fasterxml & spark version

* Create hudi-spark-common module & refactor hudi-spark related modules

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-09 15:52:23 -08:00
jshmchenxi
3a91d26d62 fix typo (#2308)
Co-authored-by: Xi Chen <chenxi07@qiyi.com>
2020-12-08 06:28:20 -08:00
wangxianghu
de2fbeac33 [HUDI-1412] Make HoodieWriteConfig support setting different default … (#2278)
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
pengzhiwei
319b7a58e4 [HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true (#2295) 2020-12-05 08:07:25 +08:00
liujinhui
62b392b49c [HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192)
Co-authored-by: liujh <liujh@t3go.cn>
2020-12-03 19:28:34 -08:00
lw0090
1f0d5c077e [HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196) 2020-12-03 12:26:21 -08:00
rmpifer
78fd122594 [HUDI-1196] Update HoodieKey when deduplicating records with global index (#2248)
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise 

Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f [HUDI-1357] Added a check to validate records are not lost during merges. (#2216)
- Turned off by default
2020-12-01 13:44:57 -08:00
Guy Khazma
b826c53e33 [HUDI-1373] Add Support for OpenJ9 JVM (#2231)
* add supoort for OpenJ9 VM
* add 32bit openJ9
* Pulled the memory layout specs into their own classes.
2020-12-01 13:19:40 -08:00
pengzhiwei
36ce5bcd92 [HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true (#2289) 2020-11-30 23:07:21 +08:00
leesf
3d5e9fee7f [MINOR] refactor code in HoodieMergeHandle (#2272) 2020-11-28 21:47:05 +08:00
steven zhang
56866a11fe [HUDI-1392] lose partition info when using spark parameter basePath (#2243)
Co-authored-by: zhang wen <wen.zhang@dmall.com>
2020-11-25 11:55:33 +08:00
Balaji Varadarajan
0ebef1c0a0 [HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable (#2249) 2020-11-23 10:56:26 -08:00
wenningd
751e4ee882 [HUDI-1396] Fix for preventing bootstrap datasource jobs from hanging via spark-submit (#2253)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-11-23 10:43:24 -08:00
Shen Hong
d9411c38db [HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client (#2222) 2020-11-23 10:06:28 -08:00
hongdd
971f028aaf [HUDI-1393] Add compaction action in archive command (#2246) 2020-11-23 16:53:01 +08:00
wangxianghu
537502a8ef [MINOR] Add apacheflink label (#2268) 2020-11-22 10:41:11 +08:00
Gary Li
c8d5ea2752 [MINOR] clean up and add comments to flink client (#2261) 2020-11-19 15:27:52 +08:00
pengzhiwei
d7af8caa45 [HUDI-1384] Decoupling hive jdbc dependency when HIVE_USE_JDBC_OPT_KEY set false (#2241) 2020-11-19 13:44:03 +08:00
wangxianghu
a23230c8c2 [HUDI-1400] Replace Operation enum with WriteOperationType (#2259) 2020-11-19 13:40:04 +08:00
wangxianghu
4d05680038 [HUDI-1327] Introduce base implemetation of hudi-flink-client (#2176) 2020-11-18 17:57:11 +08:00
Karl_Wang
430d4b428e [HUDI-1377] remove duplicate code (#2235) 2020-11-10 10:08:08 -08:00
Balaji Varadarajan
42b6aeca28 [HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter (#2217) 2020-11-09 19:26:13 -08:00
wenningd
0364498ae3 [HUDI-1375] Fix bug in HoodieAvroUtils.removeMetadataFields() method (#2232)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-11-05 17:30:17 -08:00
satishkotha
33ec88fc38 [HUDI-1352] Add FileSystemView APIs to query pending clustering operations (#2202) 2020-11-05 08:49:58 -08:00
lw0090
5f5c15b0d9 [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files (#2190)
* [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files
* [HUDI-892]  for test
* [HUDI-892]  fix bug generate array from split
* [HUDI-892] revert test log
2020-11-02 20:00:12 -08:00
wangxianghu
d160abb437 [HUDI-912] Refactor and relocate KeyGenerator to support more engines (#2200)
* [HUDI-912] Refactor and relocate KeyGenerator to support more engines

* Rename KeyGenerators
2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju
59f995a3f5 Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes (#1484) 2020-11-02 08:33:27 -08:00