Gary Li
c5e8a024f6
[HUDI-1418] Set up flink client unit test infra ( #2281 )
2020-12-31 08:57:22 +08:00
Gary Li
605b617cfa
[HUDI-1434] fix incorrect log file path in HoodieWriteStat ( #2300 )
...
* [HUDI-1434] fix incorrect log file path in HoodieWriteStat
* HoodieWriteHandle#close() returns a list of WriteStatus objs
* Handle rolled-over log files and return a WriteStatus per log file written
- Combined data and delete block logging into a single call
- Lazily initialize and manage write status based on returned AppendResult
- Use FSUtils.getFileSize() to set final file size, consistent with other handles
- Added tests around returned values in AppendResult
- Added validation of the file sizes returned in write stat
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-12-30 14:22:15 -08:00
Prashant Wason
c6bf952332
[HUDI-1493] Fixed schema compatibility check for fields. ( #2350 )
...
Some field types changes are allowed (e.g. int -> long) while maintaining schema backward compatibility within HUDI. The check was reversed with the reader schema being passed for the write schema.
2020-12-29 20:02:21 -05:00
Balajee Nagasubramaniam
da51aa64fc
[HUDI-1474] Add additional unit tests to TestHBaseIndex ( #2349 )
2020-12-28 23:04:38 -05:00
steven zhang
4c17528de0
[HUDI-1398] Align insert file size for reducing IO ( #2256 )
...
* [HUDI-1398] Align insert file size for reducing IO
Co-authored-by: zhang wen <wen.zhang@dmall.com >
2020-12-28 22:52:35 -05:00
Danny Chan
0ecdec348e
[MINOR] Remove the duplicate code in AbstractHoodieWriteClient.startCommit ( #2385 )
2020-12-29 10:49:24 +08:00
lw0090
e177466fd2
[HUDI-1350] Support Partition level delete API in HUDI ( #2254 )
...
* [HUDI-1350] Support Partition level delete API in HUDI
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
* [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction
2020-12-28 15:01:06 -08:00
lw0090
6cdf59d92b
[HUDI-1354] Block updates and replace on file groups in clustering ( #2275 )
...
* [HUDI-1354] Block updates and replace on file groups in clustering
* [HUDI-1354] Block updates and replace on file groups in clustering
2020-12-27 20:30:29 -08:00
wenningd
286055ce34
[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 ( #2328 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00
pengzhiwei
38b9264dd0
[HUDI-1488] Fix Test Case Failure in TestHBaseIndex ( #2365 )
2020-12-23 16:47:38 +08:00
wangxianghu
01ad449ad6
[HUDI-1485] Fix Deletes issued without any prior commits exception ( #2361 )
2020-12-22 23:10:19 +08:00
satishkotha
959afb8ba4
Merge pull request #2263 from satishkotha/sk/clustering
...
[HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan
2020-12-21 19:18:18 -08:00
Satish Kotha
6dc03b65bf
[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan
2020-12-21 17:34:15 -08:00
jshmchenxi
0c821fecc2
[MINOR] Pass root exception to HoodieKeyGeneratorException for more information ( #2354 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-22 09:02:23 +08:00
Shen Hong
e4e2fbc3bb
[HUDI-1419] Add base implementation for hudi java client ( #2286 )
2020-12-19 19:25:27 -08:00
Sivabalan Narayanan
33d338f392
[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue ( #2311 )
...
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible.
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
lw0090
8b5d6f9430
[HUDI-1437] support more accurate spark JobGroup for better performance tracking ( #2322 )
2020-12-17 15:20:13 -08:00
Danny Chan
93d9c25aee
[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index ( #2319 )
2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24
[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets ( #2301 )
2020-12-14 22:24:12 -08:00
steven zhang
11bc1fe6f4
[HUDI-1428] Clean old fileslice is invalid ( #2292 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
Co-authored-by: zhang wen <steven@stevendeMac-mini.local >
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec
[HUDI-1439] Remove scala dependency from hudi-client-common ( #2306 )
2020-12-11 00:36:37 -08:00
Danny Chan
4bc45a391a
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder ( #2313 )
2020-12-10 20:02:02 +08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00
wangxianghu
de2fbeac33
[HUDI-1412] Make HoodieWriteConfig support setting different default … ( #2278 )
...
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
lw0090
1f0d5c077e
[HUDI-1349] spark sql support overwrite use insert_overwrite_table ( #2196 )
2020-12-03 12:26:21 -08:00
rmpifer
78fd122594
[HUDI-1196] Update HoodieKey when deduplicating records with global index ( #2248 )
...
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f
[HUDI-1357] Added a check to validate records are not lost during merges. ( #2216 )
...
- Turned off by default
2020-12-01 13:44:57 -08:00
leesf
3d5e9fee7f
[MINOR] refactor code in HoodieMergeHandle ( #2272 )
2020-11-28 21:47:05 +08:00
Balaji Varadarajan
0ebef1c0a0
[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable ( #2249 )
2020-11-23 10:56:26 -08:00
Shen Hong
d9411c38db
[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client ( #2222 )
2020-11-23 10:06:28 -08:00
Gary Li
c8d5ea2752
[MINOR] clean up and add comments to flink client ( #2261 )
2020-11-19 15:27:52 +08:00
wangxianghu
4d05680038
[HUDI-1327] Introduce base implemetation of hudi-flink-client ( #2176 )
2020-11-18 17:57:11 +08:00
Balaji Varadarajan
42b6aeca28
[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter ( #2217 )
2020-11-09 19:26:13 -08:00
wangxianghu
d160abb437
[HUDI-912] Refactor and relocate KeyGenerator to support more engines ( #2200 )
...
* [HUDI-912] Refactor and relocate KeyGenerator to support more engines
* Rename KeyGenerators
2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju
59f995a3f5
Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes ( #1484 )
2020-11-02 08:33:27 -08:00
wangxianghu
e206ddd431
[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean ( #2194 )
2020-10-26 12:22:11 +08:00
lw0090
8545ea3856
[HUDI-1118] Cleanup rollback files residing in .hoodie folder ( #2205 )
2020-10-25 21:04:56 -07:00
Prashant Wason
49e855c348
[HUDI-1326] Added an API to force publish metrics and flush them. ( #2152 )
...
* [HUDI-1326] Added an API to force publish metrics and flush them.
Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite.
* Code cleanups
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-10-24 16:47:24 -07:00
Raymond Xu
14c4611857
[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor ( #2195 )
...
Fixed calling the wrong constructor
2020-10-21 19:50:10 -07:00
lw0090
4d80e1e221
[HUDI-284] add more test for UpdateSchemaEvolution ( #2127 )
...
Unit test different schema evolution scenarios.
2020-10-19 07:38:04 -07:00
wangxianghu
c7d962efff
[HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client ( #2161 )
2020-10-14 09:30:49 +08:00
satishkotha
0d407342ef
[HUDI-1304] Add unit test for testing compaction on replaced file groups ( #2150 )
2020-10-12 16:48:29 -07:00
Raymond Xu
c5e10d668f
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2167 )
...
Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`
Migrated usages in `TestCleaner#testPendingCompactions`.
Also improved some API names in `HoodieTestTable`.
2020-10-12 14:39:10 +08:00
hj2016
c0472d3317
[HUDI-1184] Fix the support of hbase index partition path change ( #1978 )
...
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column
Co-authored-by: huangjing <huangjing@clinbrain.com >
2020-10-11 19:05:57 -07:00
dugenkui
b58daf29ba
[MINOR] remove unused generics type ( #2163 )
2020-10-11 18:38:42 -07:00
vinoyang
eafd7bf289
[MINOR] Fix wrong javadoc and refactor some naming issues ( #2156 )
2020-10-09 15:09:26 -07:00
Raymond Xu
1d1d91d444
[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable ( #2143 )
...
* [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable
Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles
Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean
Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
2020-10-09 10:21:27 +08:00
Pratyaksh Sharma
524193eb4b
[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode ( #1566 )
...
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com >
2020-10-06 20:34:03 -07:00
lw0090
fdae388626
[HUDI-1203] add port configuration for EmbeddedTimelineService ( #2142 )
2020-10-05 11:36:54 -07:00
Prashant Wason
6c610b91ef
[HUDI-1305] Added an API to shutdown and remove the metrics reporter. ( #2132 )
...
This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs.
- Added an API to shutdown the metrics reporter after tests.
2020-10-04 09:30:04 -07:00