Balajee Nagasubramaniam
da51aa64fc
[HUDI-1474] Add additional unit tests to TestHBaseIndex ( #2349 )
2020-12-28 23:04:38 -05:00
steven zhang
4c17528de0
[HUDI-1398] Align insert file size for reducing IO ( #2256 )
...
* [HUDI-1398] Align insert file size for reducing IO
Co-authored-by: zhang wen <wen.zhang@dmall.com >
2020-12-28 22:52:35 -05:00
lw0090
e177466fd2
[HUDI-1350] Support Partition level delete API in HUDI ( #2254 )
...
* [HUDI-1350] Support Partition level delete API in HUDI
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
2020-12-28 15:01:06 -08:00
lw0090
6cdf59d92b
[HUDI-1354] Block updates and replace on file groups in clustering ( #2275 )
...
* [HUDI-1354] Block updates and replace on file groups in clustering
* [HUDI-1354] Block updates and replace on file groups in clustering
2020-12-27 20:30:29 -08:00
pengzhiwei
38b9264dd0
[HUDI-1488] Fix Test Case Failure in TestHBaseIndex ( #2365 )
2020-12-23 16:47:38 +08:00
satishkotha
959afb8ba4
Merge pull request #2263 from satishkotha/sk/clustering
...
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf
[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan
2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2
[MINOR] Pass root exception to HoodieKeyGeneratorException for more information ( #2354 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-22 09:02:23 +08:00
lw0090
8b5d6f9430
[HUDI-1437] support more accurate spark JobGroup for better performance tracking ( #2322 )
2020-12-17 15:20:13 -08:00
Danny Chan
93d9c25aee
[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index ( #2319 )
2020-12-14 22:35:24 -08:00
steven zhang
11bc1fe6f4
[HUDI-1428] Clean old fileslice is invalid ( #2292 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
Co-authored-by: zhang wen <steven@stevendeMac-mini.local >
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec
[HUDI-1439] Remove scala dependency from hudi-client-common ( #2306 )
2020-12-11 00:36:37 -08:00
Danny Chan
4bc45a391a
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder ( #2313 )
2020-12-10 20:02:02 +08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00
lw0090
1f0d5c077e
[HUDI-1349] spark sql support overwrite use insert_overwrite_table ( #2196 )
2020-12-03 12:26:21 -08:00
rmpifer
78fd122594
[HUDI-1196] Update HoodieKey when deduplicating records with global index ( #2248 )
...
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f
[HUDI-1357] Added a check to validate records are not lost during merges. ( #2216 )
...
- Turned off by default
2020-12-01 13:44:57 -08:00
wangxianghu
4d05680038
[HUDI-1327] Introduce base implemetation of hudi-flink-client ( #2176 )
2020-11-18 17:57:11 +08:00
Balaji Varadarajan
42b6aeca28
[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter ( #2217 )
2020-11-09 19:26:13 -08:00
wangxianghu
d160abb437
[HUDI-912] Refactor and relocate KeyGenerator to support more engines ( #2200 )
...
* [HUDI-912] Refactor and relocate KeyGenerator to support more engines
* Rename KeyGenerators
2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju
59f995a3f5
Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes ( #1484 )
2020-11-02 08:33:27 -08:00
wangxianghu
e206ddd431
[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean ( #2194 )
2020-10-26 12:22:11 +08:00
Raymond Xu
14c4611857
[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor ( #2195 )
...
Fixed calling the wrong constructor
2020-10-21 19:50:10 -07:00
lw0090
4d80e1e221
[HUDI-284] add more test for UpdateSchemaEvolution ( #2127 )
...
Unit test different schema evolution scenarios.
2020-10-19 07:38:04 -07:00
satishkotha
0d407342ef
[HUDI-1304] Add unit test for testing compaction on replaced file groups ( #2150 )
2020-10-12 16:48:29 -07:00
Raymond Xu
c5e10d668f
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2167 )
...
Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`
Migrated usages in `TestCleaner#testPendingCompactions`.
Also improved some API names in `HoodieTestTable`.
2020-10-12 14:39:10 +08:00
hj2016
c0472d3317
[HUDI-1184] Fix the support of hbase index partition path change ( #1978 )
...
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column
Co-authored-by: huangjing <huangjing@clinbrain.com >
2020-10-11 19:05:57 -07:00
dugenkui
b58daf29ba
[MINOR] remove unused generics type ( #2163 )
2020-10-11 18:38:42 -07:00
Raymond Xu
1d1d91d444
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2143 )
...
* [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable
Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles
Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean
Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
2020-10-09 10:21:27 +08:00
Pratyaksh Sharma
524193eb4b
[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode ( #1566 )
...
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com >
2020-10-06 20:34:03 -07:00
Mathieu
1f7add9291
[HUDI-1089] Refactor hudi-client to support multi-engine ( #1827 )
...
- This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules
- Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc
- Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common`
- Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies
- To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-10-01 14:25:29 -07:00