lw0090
8b5d6f9430
[HUDI-1437] support more accurate spark JobGroup for better performance tracking ( #2322 )
2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran
14d5d1100c
[HUDI-1406] Add date partition based source input selector for Delta streamer ( #2264 )
...
- Adds ability to list only recent date based partitions from source data.
- Parallelizes listing for faster tailing of DFSSources
2020-12-17 03:59:30 -08:00
wangxianghu
4ddfc61d70
[MINOR] Make QuickstartUtil generate random timestamp instead of 0 ( #2340 )
2020-12-17 18:00:23 +08:00
ChangLi
6a6b772c49
[MINOR] Fix error information in exception ( #2341 )
2020-12-16 19:37:01 +08:00
wenningd
26cdc457f6
[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing ( #2233 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-15 16:20:48 -08:00
Danny Chan
93d9c25aee
[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index ( #2319 )
2020-12-14 22:35:24 -08:00
Balaji Varadarajan
069a1dcf24
[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets ( #2301 )
2020-12-14 22:24:12 -08:00
lw0090
facde4c16f
[HUDI-1448] Hudi dla sync support skip rt table syncing ( #2324 )
2020-12-14 23:25:10 +08:00
steven zhang
11bc1fe6f4
[HUDI-1428] Clean old fileslice is invalid ( #2292 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
Co-authored-by: zhang wen <steven@stevendeMac-mini.local >
2020-12-13 06:28:53 -08:00
Shen Hong
236d1b0dec
[HUDI-1439] Remove scala dependency from hudi-client-common ( #2306 )
2020-12-11 00:36:37 -08:00
wangxianghu
6cf25d5c8a
[MINOR] Minor improve in IncrementalRelation ( #2314 )
2020-12-10 20:16:00 +08:00
Danny Chan
4bc45a391a
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder ( #2313 )
2020-12-10 20:02:02 +08:00
Raymond Xu
bd9cceccb5
[HUDI-1395] Fix partition path using FSUtils ( #2312 )
...
Fixed the logic to get partition path in Copier and Exporter utilities.
2020-12-10 10:19:19 +08:00
wangxianghu
007014c1ef
[MINOR] Throw an exception when keyGenerator initialization failed ( #2307 )
2020-12-10 09:56:19 +08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00
jshmchenxi
3a91d26d62
fix typo ( #2308 )
...
Co-authored-by: Xi Chen <chenxi07@qiyi.com >
2020-12-08 06:28:20 -08:00
wangxianghu
de2fbeac33
[HUDI-1412] Make HoodieWriteConfig support setting different default … ( #2278 )
...
* [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type
2020-12-07 09:29:53 +08:00
pengzhiwei
319b7a58e4
[HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true ( #2295 )
2020-12-05 08:07:25 +08:00
liujinhui
62b392b49c
[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion ( #2192 )
...
Co-authored-by: liujh <liujh@t3go.cn >
2020-12-03 19:28:34 -08:00
lw0090
1f0d5c077e
[HUDI-1349] spark sql support overwrite use insert_overwrite_table ( #2196 )
2020-12-03 12:26:21 -08:00
rmpifer
78fd122594
[HUDI-1196] Update HoodieKey when deduplicating records with global index ( #2248 )
...
- Works only for overwrite payload (default)
- Does not alter current semantics otherwise
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2020-12-01 13:50:46 -08:00
Prashant Wason
ac23d2587f
[HUDI-1357] Added a check to validate records are not lost during merges. ( #2216 )
...
- Turned off by default
2020-12-01 13:44:57 -08:00
Guy Khazma
b826c53e33
[HUDI-1373] Add Support for OpenJ9 JVM ( #2231 )
...
* add supoort for OpenJ9 VM
* add 32bit openJ9
* Pulled the memory layout specs into their own classes.
2020-12-01 13:19:40 -08:00
pengzhiwei
36ce5bcd92
[HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true ( #2289 )
2020-11-30 23:07:21 +08:00
leesf
3d5e9fee7f
[MINOR] refactor code in HoodieMergeHandle ( #2272 )
2020-11-28 21:47:05 +08:00
steven zhang
56866a11fe
[HUDI-1392] lose partition info when using spark parameter basePath ( #2243 )
...
Co-authored-by: zhang wen <wen.zhang@dmall.com >
2020-11-25 11:55:33 +08:00
Balaji Varadarajan
0ebef1c0a0
[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable ( #2249 )
2020-11-23 10:56:26 -08:00
wenningd
751e4ee882
[HUDI-1396] Fix for preventing bootstrap datasource jobs from hanging via spark-submit ( #2253 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-11-23 10:43:24 -08:00
Shen Hong
d9411c38db
[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client ( #2222 )
2020-11-23 10:06:28 -08:00
hongdd
971f028aaf
[HUDI-1393] Add compaction action in archive command ( #2246 )
2020-11-23 16:53:01 +08:00
wangxianghu
537502a8ef
[MINOR] Add apacheflink label ( #2268 )
2020-11-22 10:41:11 +08:00
Gary Li
c8d5ea2752
[MINOR] clean up and add comments to flink client ( #2261 )
2020-11-19 15:27:52 +08:00
pengzhiwei
d7af8caa45
[HUDI-1384] Decoupling hive jdbc dependency when HIVE_USE_JDBC_OPT_KEY set false ( #2241 )
2020-11-19 13:44:03 +08:00
wangxianghu
a23230c8c2
[HUDI-1400] Replace Operation enum with WriteOperationType ( #2259 )
2020-11-19 13:40:04 +08:00
wangxianghu
4d05680038
[HUDI-1327] Introduce base implemetation of hudi-flink-client ( #2176 )
2020-11-18 17:57:11 +08:00
Karl_Wang
430d4b428e
[HUDI-1377] remove duplicate code ( #2235 )
2020-11-10 10:08:08 -08:00
Balaji Varadarajan
42b6aeca28
[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter ( #2217 )
2020-11-09 19:26:13 -08:00
wenningd
0364498ae3
[HUDI-1375] Fix bug in HoodieAvroUtils.removeMetadataFields() method ( #2232 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-11-05 17:30:17 -08:00
satishkotha
33ec88fc38
[HUDI-1352] Add FileSystemView APIs to query pending clustering operations ( #2202 )
2020-11-05 08:49:58 -08:00
lw0090
5f5c15b0d9
[HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files ( #2190 )
...
* [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files
* [HUDI-892] for test
* [HUDI-892] fix bug generate array from split
* [HUDI-892] revert test log
2020-11-02 20:00:12 -08:00
wangxianghu
d160abb437
[HUDI-912] Refactor and relocate KeyGenerator to support more engines ( #2200 )
...
* [HUDI-912] Refactor and relocate KeyGenerator to support more engines
* Rename KeyGenerators
2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju
59f995a3f5
Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes ( #1484 )
2020-11-02 08:33:27 -08:00
Sivabalan Narayanan
a205dd10fa
[HUDI-1338] Adding Delete support to test suite framework ( #2172 )
...
- Adding Delete support to test suite.
Added DeleteNode
Added support to generate delete records
2020-11-01 00:15:41 -04:00
Prashant Wason
6310a2307a
[HUDI-1351] Improvements to the hudi test suite for scalability and repeated testing. ( #2197 )
...
1. Added the --clean-input and --clean-output parameters to clean the input and output directories before starting the job
2. Added the --delete-old-input parameter to deleted older batches for data already ingested. This helps keep number of redundant files low.
3. Added the --input-parallelism parameter to restrict the parallelism when generating input data. This helps keeping the number of generated input files low.
4. Added an option start_offset to Dag Nodes. Without ability to specify start offsets, data is generated into existing partitions. With start offset, DAG can control on which partition, the data is to be written.
5. Fixed generation of records for correct number of partitions
- In the existing implementation, the partition is chosen as a random long. This does not guarantee exact number of requested partitions to be created.
6. Changed variable blacklistedFields to be a Set as that is faster than List for membership checks.
7. Fixed integer division for Math.ceil. If two integers are divided, the result is not double unless one of the integer is casted to double.
2020-10-29 06:50:37 -07:00
liujinhui
736a940854
[HUDI-1274] Make hive synchronization supports hourly partition ( #2122 )
2020-10-29 11:29:50 +08:00
n3nash
e109a61803
1. Fix merge on read DAG to make docker demo pass ( #2092 )
...
1. Fix merge on read DAG to make docker demo pass (#2092 )
2. Fix repeat_count, rollback node
2020-10-28 22:34:26 -04:00
wangxianghu
e206ddd431
[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean ( #2194 )
2020-10-26 12:22:11 +08:00
lw0090
8545ea3856
[HUDI-1118] Cleanup rollback files residing in .hoodie folder ( #2205 )
2020-10-25 21:04:56 -07:00
Prashant Wason
49e855c348
[HUDI-1326] Added an API to force publish metrics and flush them. ( #2152 )
...
* [HUDI-1326] Added an API to force publish metrics and flush them.
Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite.
* Code cleanups
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-10-24 16:47:24 -07:00
Raymond Xu
14c4611857
[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor ( #2195 )
...
Fixed calling the wrong constructor
2020-10-21 19:50:10 -07:00