Shiyan Xu
6b84384022
Revert "[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )" ( #6192 )
...
This reverts commit d5c904e10e .
2022-07-22 12:20:39 -07:00
Shiyan Xu
d5c904e10e
[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )
2022-07-22 10:30:00 -05:00
Shiyan Xu
726e8e3590
[MINOR] Disable TestHiveSyncGlobalCommitTool ( #6119 )
2022-07-15 10:23:21 -07:00
Shiyan Xu
51244eba82
[HUDI-4323] Make database table names optional in sync tool ( #6073 )
...
* [HUDI-4323] Make database table names optional in sync tool
* Infer from these properties from the table config
2022-07-11 10:03:31 +05:30
Shiyan Xu
046044c83d
[HUDI-4324] Remove use_jdbc config from hudi sync ( #6072 )
...
* [HUDI-4324] Remove use_jdbc config from hudi sync
* Users should use HIVE_SYNC_MODE instead
2022-07-10 11:16:09 +05:30
Shiyan Xu
6187622178
[MINOR] Improve variable names ( #6039 )
2022-07-04 18:03:50 -07:00
voonhous
c091e4cc30
[HUDI-3730] Add ConfigTool#toMap UT ( #6035 )
...
Co-authored-by: voonhou.su <voonhou.su@shopee.com >
2022-07-04 15:07:19 -07:00
Shiyan Xu
c0e1587966
[HUDI-3730] Improve meta sync class design and hierarchies ( #5854 )
...
* [HUDI-3730] Improve meta sync class design and hierarchies (#5754 )
* Implements class design proposed in RFC-55
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: jian.feng <jian.feng@shopee.com >
2022-07-03 14:47:25 +05:30
bschell
fd7d25ab63
[HUDI-1176] Upgrade hudi to log4j2 ( #5366 )
...
* Move to log4j2
cr: https://code.amazon.com/reviews/CR-71010705
* Upgrade unit tests to log4j2
* update exclusion
Co-authored-by: Brandon Scheller <bschelle@amazon.com >
2022-06-28 12:54:23 -07:00
Sivabalan Narayanan
0a9e568ff5
[HUDI-5246] Bumping mysql connector version due to security vulnerability ( #5851 )
2022-06-26 16:54:57 -07:00
Shiyan Xu
5aaac21d1d
[HUDI-4224] Fix CI issues ( #5842 )
...
- Upgrade junit to 5.7.2
- Downgrade surefire and failsafe to 2.22.2
- Fix test failures that were previously not reported
- Improve azure pipeline configs
Co-authored-by: liujinhui1994 <965147871@qq.com >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-06-12 11:44:18 -07:00
Raymond Xu
1349b596a1
[HUDI-4198] Fix hive config for AWSGlueClientFactory ( #5768 )
...
* HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory
* Resolve metastore uri config before loading fs conf
* Skip hiveql due to CI issue
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-06-07 20:21:31 +05:30
Heap
47b764ec33
[HUDI-4134] Fix Method naming consistency issues in FSUtils ( #5655 )
2022-05-23 15:28:48 -07:00
felixYyu
716e995a38
[MINOR] Removing redundant semicolons and line breaks ( #5662 )
2022-05-23 15:26:36 -07:00
huberylee
85b146d3d5
[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table ( #5532 )
2022-05-20 22:25:32 +08:00
董可伦
b8e465fdfc
[MINOR] Fix typos in log4j-surefire.properties ( #5212 )
2022-04-15 13:33:37 -07:00
Raymond Xu
5e65aefc61
[HUDI-3837] Fix license and rat check settings ( #5273 )
...
- add missing licenses
- fix CI setting to run rat plugin
- fix deploy script to include integ test modules
2022-04-09 11:01:18 -07:00
Raymond Xu
e96f08f355
Moving to 0.12.0-SNAPSHOT on master branch.
2022-04-06 15:24:10 +08:00
ForwardXu
3449e86989
[HUDI-3780] improve drop partitions ( #5178 )
2022-04-05 11:52:33 +08:00
Vinoth Govindarajan
20964df770
[HUDI-3357] MVP implementation of BigQuerySyncTool ( #5125 )
...
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-04-02 13:18:06 -07:00
todd5167
eef3f9c74a
[HUDI-3771] flink supports sync table information to aws glue ( #5202 )
2022-04-02 21:16:10 +08:00
codejoyan
51a701cef1
[HUDI-3020] Utility to create manifest file ( #5153 )
...
Co-authored-by: joyan <joyan.sil@walmart.com >
2022-03-31 07:22:03 -07:00
Raymond Xu
31d4a16deb
[HUDI-3536] Add hudi-datahub-sync implementation ( #5155 )
2022-03-30 14:38:02 -07:00
ForwardXu
0802510ca9
[HUDI-2520] Fix drop partition issue when sync to hive ( #5147 )
2022-03-29 11:28:19 -07:00
Raymond Xu
6ccbae4d2a
[HUDI-2757] Implement Hudi AWS Glue sync ( #5076 )
2022-03-28 14:54:59 -04:00
Raymond Xu
686da41696
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer ( #5120 )
2022-03-24 09:10:33 -07:00
Rajesh Mahindra
5f570ea151
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs ( #4175 )
...
- Refactor hive sync tool / config to use reflection and standardize configs
Co-authored-by: sivabalan <n.siva.b@gmail.com >
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-21 22:56:31 -04:00
MrSleeping123
8859b48b2a
[HUDI-3383] Sync column comments while syncing a hive table ( #4960 )
...
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
2022-03-10 09:44:39 +08:00
Yann Byron
2fe7a3a41f
[HUDI-2610] pass the spark version when sync the table created by spark ( #4758 )
...
* [HUDI-2610] pass the spark version when sync the table created by spark
* [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig
2022-02-10 21:05:28 +05:30
ForwardXu
773b317983
[HUDI-2941] Show _hoodie_operation in spark sql results ( #4649 )
2022-02-07 06:28:13 -08:00
ehui
538db185ca
[HUDI-2491] Expose HMS mode metastore uri config option for spark writer ( #3962 )
2022-02-07 18:13:51 +05:30
Alexey Kudinkin
a68e1dc2db
[HUDI-431] Adding support for Parquet in MOR LogBlocks ( #4333 )
...
- Adding support for Parquet in MOR tables Log blocks
Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com >
2022-02-02 14:35:05 -05:00
董可伦
822230d9ea
[MINOR] Optimize variable names and logs ( #4581 )
2022-01-16 16:09:22 +08:00
Sagar Sumit
12e95771ee
[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency ( #4574 )
...
- Move log4j-core to top level pom
2022-01-12 11:53:43 -05:00
董可伦
017ddbbfac
[MINOR] Fix typos ( #4567 )
2022-01-11 23:17:10 -08:00
Pratyaksh Sharma
a392e9ba46
[HUDI-485] Corrected the check for incremental sql ( #2768 )
...
* [HUDI-485]: corrected the check for incremental sql
* [HUDI-485]: added tests
* code review comments addressed
* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
YueZhang
cf362fb2d5
[MINOR] Fix some code style issues based on check-style plugin ( #4532 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-09 01:14:56 -08:00
董可伦
4f6cdd73a3
[HUDI-3192] Spark metastore schema evolution broken ( #4533 )
2022-01-08 10:48:37 +08:00
董可伦
b1df60672b
[MINOR] fix typos in DDLExecutor ( #4534 )
2022-01-07 07:59:55 -05:00
Danny Chan
0e297c0c4c
[HUDI-3171] Sync empty table to hive metastore ( #4511 )
2022-01-05 16:41:33 +08:00
YueZhang
1e2d2c437d
[HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions ( #4493 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-02 22:43:30 -05:00
YueZhang
ef9923fc55
[HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms ( #4453 )
...
* constructDropPartitions when drop partitions using jdbc
* done
* done
* code style
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2021-12-31 15:56:33 +08:00
Shawy Geng
a4e622ac61
[HUDI-1951] Add bucket hash index, compatible with the hive bucket ( #3173 )
...
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-12-30 12:38:26 -08:00
Udit Mehrotra
9412281cb1
[HUDI-2983] Remove Log4j2 transitive dependencies ( #4281 )
2021-12-28 07:15:05 -08:00
ForwardXu
32505d5adb
[HUDI-3106] Fix HiveSyncTool not sync schema ( #4452 )
2021-12-27 22:11:14 -08:00
ForwardXu
dbec6c512b
[HUDI-3022] Fix NPE for isDropPartition method ( #4319 )
...
* [HUDI-3022] Fix NPE for isDropPartition method
2021-12-15 19:38:02 +08:00
ForwardXu
dd96129191
[HUDI-2990] Sync to HMS when deleting partitions ( #4291 )
2021-12-13 20:40:06 +08:00
fengli
568181a3e7
[HUDI-2934] Optimize RequestHandler code style
...
close apache/hudi#4215
2021-12-04 15:30:52 +08:00
yuzhao.cyz
a1d0ff4209
Moving to 0.11.0-SNAPSHOT on master branch.
2021-11-27 17:22:10 +08:00
Nate Radtke
887787e8b9
[HUDI-1932] Update Hive sync timestamp when change detected ( #3053 )
...
* Update Hive sync timestamp when change detected
Only update the last commit timestamp on the Hive table when the table schema
has changed or a partition is created/updated.
When using AWS Glue Data Catalog as the metastore for Hive this will ensure
that table versions are substantive (including schema and/or partition
changes). Prior to this change when a Hive sync is performed without schema
or partition changes the table in the Glue Data Catalog would have a new
version published with the only change being the timestamp property.
https://issues.apache.org/jira/browse/HUDI-1932
* add conditional sync flag
* fix testSyncWithoutDiffs
* fix HiveSyncConfig
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2021-11-21 12:11:05 +05:30