1
0
Commit Graph

1344 Commits

Author SHA1 Message Date
ZhangChaoMing
291f92069e [MINOR] Fix wrong logic for checking state condition (#2524) 2021-02-06 16:40:31 +08:00
n3nash
b2c47a24be [HUDI-1589] Fix Rollback Metadata AVRO backwards incompatiblity (#2543) 2021-02-05 16:03:34 -08:00
Sivabalan Narayanan
b5d4a046bb [HUDI-1571] Adding commit_show_records_info to display record sizes for commit (#2514) 2021-02-05 07:53:24 -05:00
hiscat
b51b3a39a8 [HUDI-1420] HoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator (#2526)
* Fix HUDI-1420

FIX https://issues.apache.org/jira/browse/HUDI-1420

* fix(hudi-common): fix HUDI-1420 HoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator

Co-authored-by: 谢波 <xiebo1@yonghui.cn>
2021-02-05 16:24:35 +08:00
Sivabalan Narayanan
4a5683d54a [MINOR] Fixing the default value for source ordering field for payload config (#2516) 2021-02-04 08:43:03 -05:00
wangxianghu
647e9faf25 [HUDI-1547] CI intermittent failure: TestJsonStringToHoodieRecordMapF… (#2521) 2021-02-04 11:20:01 +08:00
Volodymyr Burenin
17802569fd [HUDI-1538] Try to init class trying different signatures instead of checking its name (#2476)
* [HUDI-1538] Try to init class trying different signatures instead of checking its name.

* Removed unused imports

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
2021-02-03 12:29:08 -08:00
Sivabalan Narayanan
eb91e5ba70 [HUDI-1523] Call mkdir(partition) only if not exists (#2501) 2021-02-03 09:02:37 -05:00
wangxianghu
d74d8e2084 [HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client (#2271) 2021-02-03 08:59:49 +08:00
vinoyang
50ff9ab2d2 [MINOR] Rename FileSystemViewHandler to RequestHandler and corrected the class comment (#2458) 2021-02-02 09:15:53 -08:00
jackiehff
ec950b4cfe [MINOR] Fix method comment typo (#2518)
Co-authored-by: 黄飞飞 <huangfeifei@mininglamp.com>
2021-02-02 19:23:29 +08:00
pengzhiwei
0d8a4d0a56 [HUDI-1550] Honor ordering field for MOR Spark datasource reader (#2497) 2021-02-01 21:04:27 +08:00
steven zhang
f159c0c49a [HUDI-1519] Improve minKey/maxKey computation in HoodieHFileWriter (#2427)
Co-authored-by: zhang wen <steven@stevendeMac-mini.local>
2021-02-01 07:51:57 -05:00
jiangjiguang
5d053b495b [MINOR] Quickstart.generateUpdates method add check (#2505) 2021-01-30 10:28:00 +08:00
satishkotha
9cb6cb8189 [HUDI-1266] Add unit test for validating replacecommit rollback (#2418) 2021-01-29 10:28:08 -08:00
satishkotha
2d2d5c83b1 [HUDI-1555] Remove isEmpty to improve clustering execution performance (#2502) 2021-01-29 10:27:09 -08:00
wangxianghu
23f2ef3efb [HUDI-623] Remove UpgradePayloadFromUberToApache (#2455) 2021-01-28 17:48:50 -08:00
Danny Chan
bc0325f6ea [HUDI-1522] Add a new pipeline for Flink writer (#2430)
* [HUDI-1522] Add a new pipeline for Flink writer
2021-01-28 08:53:13 +08:00
wangxianghu
7b2e658ac0 [MINOR] Add Jira URL and Mailing List (#2404) 2021-01-27 19:48:42 -05:00
SteNicholas
2ee1c3fb0c [HUDI-1234] Insert new records to data files without merging for "Insert" operation. (#2111)
* Added HoodieConcatHandle to skip merging for "insert" operation when the corresponding config is set

Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-01-27 13:09:51 -05:00
luokey
a54550d94f [MINOR]Fix NPE when using HoodieFlinkStreamer with multi parallelism (#2492) 2021-01-27 21:00:20 +08:00
vinoth chandar
c8ee40f8ae [MINOR] Update doap with 0.7.0 release (#2491) 2021-01-26 09:28:22 -08:00
Shen Hong
c4afd179c1 [HUDI-1476] Introduce unit test infra for java client (#2478) 2021-01-24 11:17:19 -08:00
vinoth chandar
81836f0309 Removing spring repos from pom (#2481)
- These are being deprecated
- Causes build issues when .m2 does not have this cached already
2021-01-24 07:42:52 -08:00
Raymond Xu
84df26323d [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property (#2477) 2021-01-24 21:36:41 +08:00
wangxianghu
e302c6bc12 [HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi (#2474) 2021-01-23 10:27:40 +08:00
wangxianghu
d3ea0f957e [HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473) 2021-01-22 10:55:46 -08:00
cooper
048633da1a [MINOR] Improve code readability,remove the continue keyword (#2459) 2021-01-22 13:47:14 +08:00
wangxianghu
748dcc9aae [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFlinkStreamer and update docs (#2471) 2021-01-22 13:46:25 +08:00
Xiang Yang
641abe8ab7 [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client (#2375)
* [HUDI] Add bloom index for hudi-flink-client

Co-authored-by: yangxiang <yangxiang@oppo.com>
2021-01-22 10:36:28 +08:00
luokey
b64d22e047 [HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434) 2021-01-22 09:17:50 +08:00
wenningd
976420c49a [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412)
* [HUDI-1512] Fix spark 2 unit tests failure with Spark 3

* resolve comments

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-01-21 07:04:28 -08:00
vinoth chandar
81ccb0c71a [MINOR] Make a separate travis CI job for hudi-utilities (#2469) 2021-01-20 21:46:05 -08:00
vinoth chandar
5e30fc1b2b [MINOR] Disabling problematic tests temporarily to stabilize CI (#2468) 2021-01-20 14:24:34 -08:00
Vinoth Chandar
3719e7b388 Moving to 0.8.0-SNAPSHOT on master branch. 2021-01-20 11:31:22 -08:00
liujinhui
244f6def9c [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database (#2444)
fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database
2021-01-20 22:58:18 +08:00
teeyog
c931dc5406 [MINOR] Remove redundant judgments (#2466) 2021-01-20 20:41:09 +08:00
vinoth chandar
5ca0625b27 [HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441)
Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation.

* Pass a single instance of HoodieTableMetadata everywhere
* Fix tests and add config for enabling metrics
 - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths()
 - Consequently, IOException is never thrown and many files had to be adjusted
- More diligent handling of open file handles in metadata table
 - Added config for controlling reuse of connections
 - Added config for turning off fallback to listing, so we can see tests fail
 - Changed all ipf listing code to cache/amortize the open/close for better performance
 - Timelineserver also reuses connections, for better performance
 - Without timelineserver, when metadata table is opened from executors, reuse is not allowed
 - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument.
 -  Fix TestHoodieBackedTableMetadata#testSync
2021-01-19 21:20:28 -08:00
Sivabalan Narayanan
e23967b9e9 [HUDI-1540] Fixing commons codec shading in spark bundle (#2460) 2021-01-20 00:00:13 -05:00
Sivabalan Narayanan
91b9cb53d3 [MINOR] Fixing setting defaults for index config (#2457) 2021-01-19 18:16:25 -05:00
Sivabalan Narayanan
b9c2856d16 [HUDI-1535] Fix 0.7.0 snapshot (#2456)
* Revert "[MINOR] Bumping snapshot version to 0.7.0 (#2435)"

This reverts commit a43e191d6c.

* Fixing 0.7.0 snapshot bump
2021-01-19 12:20:43 -08:00
Volodymyr Burenin
a38612b10f [HUDI-1532] Fixed suboptimal implementation of a magic sequence search (#2440)
* Fixed suboptimal implementation of a magic sequence search on GCS.

* Fix comparison.

* Added buffered reader around plugged storage plugin such as GCS.

* 1. Corrected some comments 2. Refactored GCS input stream check

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Nishith Agarwal <nagarwal@uber.com>
2021-01-18 23:07:27 -08:00
Udit Mehrotra
684e12e9fc [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits (#2451) 2021-01-18 07:29:53 -08:00
satishkotha
3d1d5d00b0 [HUDI-1533] Make SerializableSchema work for large schemas and add ability to sortBy numeric values (#2453) 2021-01-17 12:36:55 -08:00
Sivabalan Narayanan
a43e191d6c [MINOR] Bumping snapshot version to 0.7.0 (#2435) 2021-01-16 09:56:28 -05:00
n3nash
749f657856 [HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424) 2021-01-14 12:47:50 -08:00
n3nash
e926c1a45c HUDI-1525 fix test hbase index (#2436) 2021-01-12 23:30:21 -08:00
Sivabalan Narayanan
e3d3677b7e [HUDI-1502] MOR rollback and restore support for metadata sync (#2421)
- Adds field to RollbackMetadata that capture the logs written for rollback blocks
- Adds field to RollbackMetadata that capture new logs files written by unsynced deltacommits

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-01-11 13:23:13 -08:00
lw0090
de42adc230 [HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE (#2428) 2021-01-11 09:07:47 -08:00
Udit Mehrotra
7ce3ac778e [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417)
* [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths

* Adding testClass for FileSystemBackedTableMetadata

Co-authored-by: Nishith Agarwal <nagarwal@uber.com>
2021-01-10 21:19:52 -08:00