1
0
Commit Graph

127 Commits

Author SHA1 Message Date
Shiyan Xu
eee6a02f77 [HUDI-4456] Clean up test resources (#6203) 2022-07-25 10:13:06 -05:00
Shiyan Xu
71c2c3102b [HUDI-4455] Improve test classes for TestHiveSyncTool (#6202)
Improve HiveTestService, HiveTestUtil, and related classes.
2022-07-25 19:05:34 +05:30
冯健
340c3dbbe1 [HUDI-4437] Fix test conflicts by clearing file system cache (#6123)
Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: jian.feng <jian.feng@shopee.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-22 17:58:04 -07:00
Shiyan Xu
d5c7c79d87 Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072)" (#6160)
This reverts commit 046044c83d.
2022-07-22 17:18:45 -07:00
Shiyan Xu
6b84384022 Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110)" (#6192)
This reverts commit d5c904e10e.
2022-07-22 12:20:39 -07:00
Shiyan Xu
d5c904e10e [MINOR] Fix CI issue with TestHiveSyncTool (#6110) 2022-07-22 10:30:00 -05:00
Shiyan Xu
726e8e3590 [MINOR] Disable TestHiveSyncGlobalCommitTool (#6119) 2022-07-15 10:23:21 -07:00
Shiyan Xu
51244eba82 [HUDI-4323] Make database table names optional in sync tool (#6073)
* [HUDI-4323] Make database table names optional in sync tool
* Infer from these properties from the table config
2022-07-11 10:03:31 +05:30
Shiyan Xu
046044c83d [HUDI-4324] Remove use_jdbc config from hudi sync (#6072)
* [HUDI-4324] Remove use_jdbc config from hudi sync
* Users should use HIVE_SYNC_MODE instead
2022-07-10 11:16:09 +05:30
Shiyan Xu
6187622178 [MINOR] Improve variable names (#6039) 2022-07-04 18:03:50 -07:00
voonhous
c091e4cc30 [HUDI-3730] Add ConfigTool#toMap UT (#6035)
Co-authored-by: voonhou.su <voonhou.su@shopee.com>
2022-07-04 15:07:19 -07:00
Shiyan Xu
c0e1587966 [HUDI-3730] Improve meta sync class design and hierarchies (#5854)
* [HUDI-3730] Improve meta sync class design and hierarchies (#5754)
* Implements class design proposed in RFC-55

Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: jian.feng <jian.feng@shopee.com>
2022-07-03 14:47:25 +05:30
bschell
fd7d25ab63 [HUDI-1176] Upgrade hudi to log4j2 (#5366)
* Move to log4j2

cr: https://code.amazon.com/reviews/CR-71010705

* Upgrade unit tests to log4j2

* update exclusion

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
2022-06-28 12:54:23 -07:00
Sivabalan Narayanan
0a9e568ff5 [HUDI-5246] Bumping mysql connector version due to security vulnerability (#5851) 2022-06-26 16:54:57 -07:00
Shiyan Xu
5aaac21d1d [HUDI-4224] Fix CI issues (#5842)
- Upgrade junit to 5.7.2
- Downgrade surefire and failsafe to 2.22.2
- Fix test failures that were previously not reported
- Improve azure pipeline configs

Co-authored-by: liujinhui1994 <965147871@qq.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-06-12 11:44:18 -07:00
Raymond Xu
1349b596a1 [HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768)
* HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory

* Resolve metastore uri config before loading fs conf

* Skip hiveql due to CI issue

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-06-07 20:21:31 +05:30
Heap
47b764ec33 [HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655) 2022-05-23 15:28:48 -07:00
felixYyu
716e995a38 [MINOR] Removing redundant semicolons and line breaks (#5662) 2022-05-23 15:26:36 -07:00
huberylee
85b146d3d5 [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (#5532) 2022-05-20 22:25:32 +08:00
董可伦
b8e465fdfc [MINOR] Fix typos in log4j-surefire.properties (#5212) 2022-04-15 13:33:37 -07:00
Raymond Xu
5e65aefc61 [HUDI-3837] Fix license and rat check settings (#5273)
- add missing licenses
- fix CI setting to run rat plugin
- fix deploy script to include integ test modules
2022-04-09 11:01:18 -07:00
Raymond Xu
e96f08f355 Moving to 0.12.0-SNAPSHOT on master branch. 2022-04-06 15:24:10 +08:00
ForwardXu
3449e86989 [HUDI-3780] improve drop partitions (#5178) 2022-04-05 11:52:33 +08:00
Vinoth Govindarajan
20964df770 [HUDI-3357] MVP implementation of BigQuerySyncTool (#5125)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-02 13:18:06 -07:00
todd5167
eef3f9c74a [HUDI-3771] flink supports sync table information to aws glue (#5202) 2022-04-02 21:16:10 +08:00
codejoyan
51a701cef1 [HUDI-3020] Utility to create manifest file (#5153)
Co-authored-by: joyan <joyan.sil@walmart.com>
2022-03-31 07:22:03 -07:00
Raymond Xu
31d4a16deb [HUDI-3536] Add hudi-datahub-sync implementation (#5155) 2022-03-30 14:38:02 -07:00
ForwardXu
0802510ca9 [HUDI-2520] Fix drop partition issue when sync to hive (#5147) 2022-03-29 11:28:19 -07:00
Raymond Xu
6ccbae4d2a [HUDI-2757] Implement Hudi AWS Glue sync (#5076) 2022-03-28 14:54:59 -04:00
Raymond Xu
686da41696 [HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120) 2022-03-24 09:10:33 -07:00
Rajesh Mahindra
5f570ea151 [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175)
- Refactor hive sync tool / config to use reflection and standardize configs

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-03-21 22:56:31 -04:00
MrSleeping123
8859b48b2a [HUDI-3383] Sync column comments while syncing a hive table (#4960)
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
2022-03-10 09:44:39 +08:00
Yann Byron
2fe7a3a41f [HUDI-2610] pass the spark version when sync the table created by spark (#4758)
* [HUDI-2610] pass the spark version when sync the table created by spark

* [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig
2022-02-10 21:05:28 +05:30
ForwardXu
773b317983 [HUDI-2941] Show _hoodie_operation in spark sql results (#4649) 2022-02-07 06:28:13 -08:00
ehui
538db185ca [HUDI-2491] Expose HMS mode metastore uri config option for spark writer (#3962) 2022-02-07 18:13:51 +05:30
Alexey Kudinkin
a68e1dc2db [HUDI-431] Adding support for Parquet in MOR LogBlocks (#4333)
- Adding support for Parquet in MOR tables Log blocks

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2022-02-02 14:35:05 -05:00
董可伦
822230d9ea [MINOR] Optimize variable names and logs (#4581) 2022-01-16 16:09:22 +08:00
Sagar Sumit
12e95771ee [HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4574)
- Move log4j-core to top level pom
2022-01-12 11:53:43 -05:00
董可伦
017ddbbfac [MINOR] Fix typos (#4567) 2022-01-11 23:17:10 -08:00
Pratyaksh Sharma
a392e9ba46 [HUDI-485] Corrected the check for incremental sql (#2768)
* [HUDI-485]: corrected the check for incremental sql

* [HUDI-485]: added tests

* code review comments addressed

* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
YueZhang
cf362fb2d5 [MINOR] Fix some code style issues based on check-style plugin (#4532)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-09 01:14:56 -08:00
董可伦
4f6cdd73a3 [HUDI-3192] Spark metastore schema evolution broken (#4533) 2022-01-08 10:48:37 +08:00
董可伦
b1df60672b [MINOR] fix typos in DDLExecutor (#4534) 2022-01-07 07:59:55 -05:00
Danny Chan
0e297c0c4c [HUDI-3171] Sync empty table to hive metastore (#4511) 2022-01-05 16:41:33 +08:00
YueZhang
1e2d2c437d [HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-02 22:43:30 -05:00
YueZhang
ef9923fc55 [HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453)
* constructDropPartitions when drop partitions using jdbc

* done

* done

* code style

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-31 15:56:33 +08:00
Shawy Geng
a4e622ac61 [HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173)
* [HUDI-2154] Add index key field to HoodieKey

* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-12-30 12:38:26 -08:00
Udit Mehrotra
9412281cb1 [HUDI-2983] Remove Log4j2 transitive dependencies (#4281) 2021-12-28 07:15:05 -08:00
ForwardXu
32505d5adb [HUDI-3106] Fix HiveSyncTool not sync schema (#4452) 2021-12-27 22:11:14 -08:00
ForwardXu
dbec6c512b [HUDI-3022] Fix NPE for isDropPartition method (#4319)
* [HUDI-3022] Fix NPE for isDropPartition method
2021-12-15 19:38:02 +08:00