lw0090
e177466fd2
[HUDI-1350] Support Partition level delete API in HUDI ( #2254 )
...
* [HUDI-1350] Support Partition level delete API in HUDI
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
2020-12-28 15:01:06 -08:00
lw0090
6cdf59d92b
[HUDI-1354] Block updates and replace on file groups in clustering ( #2275 )
...
* [HUDI-1354] Block updates and replace on file groups in clustering
* [HUDI-1354] Block updates and replace on file groups in clustering
2020-12-27 20:30:29 -08:00
wenningd
286055ce34
[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 ( #2328 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00
pengzhiwei
38b9264dd0
[HUDI-1488] Fix Test Case Failure in TestHBaseIndex ( #2365 )
2020-12-23 16:47:38 +08:00
wangxianghu
01ad449ad6
[HUDI-1485] Fix Deletes issued without any prior commits exception ( #2361 )
2020-12-22 23:10:19 +08:00
satishkotha
959afb8ba4
Merge pull request #2263 from satishkotha/sk/clustering
...
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf
[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan
2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2
[MINOR] Pass root exception to HoodieKeyGeneratorException for more information ( #2354 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-22 09:02:23 +08:00
Shen Hong
e4e2fbc3bb
[HUDI-1419] Add base implementation for hudi java client ( #2286 )
2020-12-19 19:25:27 -08:00
Sivabalan Narayanan
33d338f392
[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue ( #2311 )
...
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible.
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
lw0090
8b5d6f9430
[HUDI-1437] support more accurate spark JobGroup for better performance tracking ( #2322 )
2020-12-17 15:20:13 -08:00
Danny Chan
93d9c25aee
[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index ( #2319 )
2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24
[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets ( #2301 )
2020-12-14 22:24:12 -08:00
steven zhang
11bc1fe6f4
[HUDI-1428] Clean old fileslice is invalid ( #2292 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
Co-authored-by: zhang wen <steven@stevendeMac-mini.local >
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec
[HUDI-1439] Remove scala dependency from hudi-client-common ( #2306 )
2020-12-11 00:36:37 -08:00
Danny Chan
4bc45a391a
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder ( #2313 )
2020-12-10 20:02:02 +08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00
wangxianghu
de2fbeac33
[HUDI-1412] Make HoodieWriteConfig support setting different default … ( #2278 )
...
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
lw0090
1f0d5c077e
[HUDI-1349] spark sql support overwrite use insert_overwrite_table ( #2196 )
2020-12-03 12:26:21 -08:00
rmpifer
78fd122594
[HUDI-1196] Update HoodieKey when deduplicating records with global index ( #2248 )
...
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f
[HUDI-1357] Added a check to validate records are not lost during merges. ( #2216 )
...
- Turned off by default
2020-12-01 13:44:57 -08:00
leesf
3d5e9fee7f
[MINOR] refactor code in HoodieMergeHandle ( #2272 )
2020-11-28 21:47:05 +08:00
Balaji Varadarajan
0ebef1c0a0
[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable ( #2249 )
2020-11-23 10:56:26 -08:00
Shen Hong
d9411c38db
[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client ( #2222 )
2020-11-23 10:06:28 -08:00
Gary Li
c8d5ea2752
[MINOR] clean up and add comments to flink client ( #2261 )
2020-11-19 15:27:52 +08:00
wangxianghu
4d05680038
[HUDI-1327] Introduce base implemetation of hudi-flink-client ( #2176 )
2020-11-18 17:57:11 +08:00
Balaji Varadarajan
42b6aeca28
[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter ( #2217 )
2020-11-09 19:26:13 -08:00
wangxianghu
d160abb437
[HUDI-912] Refactor and relocate KeyGenerator to support more engines ( #2200 )
...
* [HUDI-912] Refactor and relocate KeyGenerator to support more engines
* Rename KeyGenerators
2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju
59f995a3f5
Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes ( #1484 )
2020-11-02 08:33:27 -08:00
wangxianghu
e206ddd431
[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean ( #2194 )
2020-10-26 12:22:11 +08:00
lw0090
8545ea3856
[HUDI-1118] Cleanup rollback files residing in .hoodie folder ( #2205 )
2020-10-25 21:04:56 -07:00
Prashant Wason
49e855c348
[HUDI-1326] Added an API to force publish metrics and flush them. ( #2152 )
...
* [HUDI-1326] Added an API to force publish metrics and flush them.
Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite.
* Code cleanups
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-10-24 16:47:24 -07:00
Raymond Xu
14c4611857
[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor ( #2195 )
...
Fixed calling the wrong constructor
2020-10-21 19:50:10 -07:00
lw0090
4d80e1e221
[HUDI-284] add more test for UpdateSchemaEvolution ( #2127 )
...
Unit test different schema evolution scenarios.
2020-10-19 07:38:04 -07:00
wangxianghu
c7d962efff
[HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client ( #2161 )
2020-10-14 09:30:49 +08:00
satishkotha
0d407342ef
[HUDI-1304] Add unit test for testing compaction on replaced file groups ( #2150 )
2020-10-12 16:48:29 -07:00
Raymond Xu
c5e10d668f
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2167 )
...
Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`
Migrated usages in `TestCleaner#testPendingCompactions`.
Also improved some API names in `HoodieTestTable`.
2020-10-12 14:39:10 +08:00
hj2016
c0472d3317
[HUDI-1184] Fix the support of hbase index partition path change ( #1978 )
...
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column
Co-authored-by: huangjing <huangjing@clinbrain.com >
2020-10-11 19:05:57 -07:00
dugenkui
b58daf29ba
[MINOR] remove unused generics type ( #2163 )
2020-10-11 18:38:42 -07:00
vinoyang
eafd7bf289
[MINOR] Fix wrong javadoc and refactor some naming issues ( #2156 )
2020-10-09 15:09:26 -07:00
Raymond Xu
1d1d91d444
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2143 )
...
* [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable
Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles
Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean
Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
2020-10-09 10:21:27 +08:00
Pratyaksh Sharma
524193eb4b
[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode ( #1566 )
...
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com >
2020-10-06 20:34:03 -07:00
lw0090
fdae388626
[HUDI-1203] add port configuration for EmbeddedTimelineService ( #2142 )
2020-10-05 11:36:54 -07:00
Prashant Wason
6c610b91ef
[HUDI-1305] Added an API to shutdown and remove the metrics reporter. ( #2132 )
...
This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs.
- Added an API to shutdown the metrics reporter after tests.
2020-10-04 09:30:04 -07:00
Mathieu
1f7add9291
[HUDI-1089] Refactor hudi-client to support multi-engine ( #1827 )
...
- This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules
- Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc
- Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common`
- Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies
- To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-10-01 14:25:29 -07:00
satishkotha
a99e93bed5
[HUDI-1072] Introduce REPLACE top level action. Implement insert_overwrite operation on top of replace action ( #2048 )
2020-09-29 17:04:25 -07:00
Raymond Xu
1be0b06ef8
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2112 )
...
Remove APIs in HoodieTestUtils
- HoodieTestUtils#createInflightCommitFiles
- HoodieTestUtils#getCommitFilePath
- HoodieTestUtils#doesCommitExist
and migrate usages to HoodieTestTable in
- hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRollbacksCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestCommitsCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestCommitMetadataGenerator.java
- hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java
2020-09-26 21:21:47 +08:00
dugenkui
ae68b2b355
[MINOR] fix typos ( #2116 )
2020-09-26 20:40:33 +08:00
dugenkui
6837118c21
[MINOR] Improve description ( #2113 )
2020-09-25 22:21:37 +08:00
lw0090
fcc497eff1
[HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss ( #2099 )
2020-09-22 09:57:20 -07:00