1
0
Commit Graph

41 Commits

Author SHA1 Message Date
Alexey Kudinkin
a68e1dc2db [HUDI-431] Adding support for Parquet in MOR LogBlocks (#4333)
- Adding support for Parquet in MOR tables Log blocks

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2022-02-02 14:35:05 -05:00
Pratyaksh Sharma
a392e9ba46 [HUDI-485] Corrected the check for incremental sql (#2768)
* [HUDI-485]: corrected the check for incremental sql

* [HUDI-485]: added tests

* code review comments addressed

* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
YueZhang
cf362fb2d5 [MINOR] Fix some code style issues based on check-style plugin (#4532)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-09 01:14:56 -08:00
YueZhang
1e2d2c437d [HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-02 22:43:30 -05:00
YueZhang
ef9923fc55 [HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453)
* constructDropPartitions when drop partitions using jdbc

* done

* done

* code style

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-31 15:56:33 +08:00
ForwardXu
dd96129191 [HUDI-2990] Sync to HMS when deleting partitions (#4291) 2021-12-13 20:40:06 +08:00
Nate Radtke
887787e8b9 [HUDI-1932] Update Hive sync timestamp when change detected (#3053)
* Update Hive sync timestamp when change detected

Only update the last commit timestamp on the Hive table when the table schema
has changed or a partition is created/updated.

When using AWS Glue Data Catalog as the metastore for Hive this will ensure
that table versions are substantive (including schema and/or partition
changes). Prior to this change when a Hive sync is performed without schema
or partition changes the table in the Glue Data Catalog would have a new
version published with the only change being the timestamp property.

https://issues.apache.org/jira/browse/HUDI-1932

* add conditional sync flag

* fix testSyncWithoutDiffs

* fix HiveSyncConfig

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2021-11-21 12:11:05 +05:30
Yann Byron
1f17467f73 [HUDI-1869] Upgrading Spark3 To 3.1 (#3844)
Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>
2021-11-02 18:25:12 -07:00
qianchutao
7e887b54d7 [MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721) 2021-09-26 21:52:35 +08:00
jsbali
f52cb32f5f [HUDI-2248] Fixing the closing of hms client (#3364)
* [HUDI-2248] Fixing the closing of hms client

* [HUDI-2248] Using Hive.closeCurrent() over client.close()
2021-09-23 13:45:24 -07:00
Raymond Xu
8255a86cb4 [HUDI-1939] remove joda time in hivesync module (#3430) 2021-08-10 20:25:41 -07:00
swuferhong
eedfadeb46 [HUDI-2244] Fix database alreadyExists exception while hive sync (#3361) 2021-07-28 19:40:16 +08:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
jsbali
66207ed91a [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879)
* [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool

* [HUDI-1848] Fixing test cases

* [HUDI-1848] CR changes

* [HUDI-1848] Fix checkstyle violations

* [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels.

* [HUDI-1848] Adding the complex schema and resolving merge conflicts

* [HUDI-1848] Adding some more javadocs

* [HUDI-1848] Added javadocs for DDLExecutor impls

* [HUDI-1848] Fixed style issue
2021-07-23 09:03:15 -07:00
pengzhiwei
93967404a7 [HUDI-2180] Fix Compile Error For Spark3 (#3274) 2021-07-14 09:02:28 -07:00
pengzhiwei
ffa934182a [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer 2021-07-12 13:03:14 +08:00
vinoth chandar
c50c24908a [MINOR] Fix build broken from #3186 (#3245) 2021-07-08 14:23:52 -07:00
xiarixiaoyao
de07e61382 [HUDI-2099]hive lock which state is WATING should be released, otherwise this hive lock will be locked forever (#3186) 2021-07-08 10:30:48 -04:00
xiarixiaoyao
6a71412f78 [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem (#3209) 2021-07-04 22:30:36 +08:00
pengzhiwei
4f215e2938 [HUDI-2057] CTAS Generate An External Table When Create Managed Table (#3146) 2021-07-03 15:55:36 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
Raymond Xu
0749cc826a [HUDI-2081] Move schema util tests out from TestHiveSyncTool (#3166) 2021-06-29 11:23:46 +08:00
n3nash
23dbc09a0d [MINOR] Removing un-used files and references (#3150) 2021-06-24 22:17:40 -07:00
s-sanjay
0fb8556b0d Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542)
[global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync

Co-authored-by: Jagmeet Bali <jsbali@uber.com>
2021-06-24 20:26:26 -07:00
pengzhiwei
ad53cf450e [HUDI-1879] Fix RO Tables Returning Snapshot Result (#2925) 2021-06-17 04:18:21 -07:00
Raymond Xu
441076b2cc [HUDI-1950] Move TestHiveMetastoreBasedLockProvider to functional (#3043)
HiveTestUtil static setup mini servers caused connection refused issue in Azure CI environment, as TestHiveSyncTool and TestHiveMetastoreBasedLockProvider share the same test facilities. Moving TestHiveMetastoreBasedLockProvider (the easier one) to functional test with a separate and improved mini server setup resolved the issue.

Also cleaned up dfs cluster from HiveTestUtil.

The next step is to move TestHiveSyncTool to functional as well.
2021-06-07 15:38:59 -07:00
li36909
2c5a661a64 [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false (#2759)
* [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false

* Fix review comment
2021-05-07 15:30:26 -07:00
pengzhiwei
c9bcb5e33f [HUDI-1845] Exception Throws When Sync Non-Partitioned Table To Hive With MultiPartKeysValueExtractor (#2876) 2021-04-28 19:11:46 -07:00
Roc Marshal
e4fd195d9f [MINOR] Refactor method up to parent-class (#2822) 2021-04-27 21:32:32 +08:00
pengzhiwei
aacb8be521 [HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283) 2021-04-20 14:21:38 -07:00
Roc Marshal
f7b6b68063 [MINOR][hudi-sync] Fix typos (#2844) 2021-04-19 16:27:13 +08:00
Vinoth Govindarajan
08e82c469c [HUDI-1762] Added HiveStylePartitionExtractor to support Hive style partitions (#2769) 2021-04-09 01:00:11 -04:00
n3nash
d7b18783bd [HUDI-1709] Improving config names and adding hive metastore uri config (#2699) 2021-03-22 01:22:06 -07:00
n3nash
74241947c1 [HUDI-845] Added locking capability to allow multiple writers (#2374)
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
2021-03-16 16:43:53 -07:00
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
liujinhui
8c2197ae5e [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable (#2443)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 10:09:32 -05:00
Sivabalan Narayanan
da2919a75f [HUDI-1383] Fixing sorting of partition vals for hive sync computation (#2402) 2021-01-06 07:49:44 -05:00
liujinhui
736a940854 [HUDI-1274] Make hive synchronization supports hourly partition (#2122) 2020-10-29 11:29:50 +08:00
satishkotha
7fa641ea9a [HUDI-1302] Add support for timestamp field in HiveSync (#2129) 2020-10-13 22:58:00 -07:00
Abhishek Modi
53d1e55110 Test Suite should work with Docker + Unit Tests 2020-09-08 22:41:14 -07:00
lw0090
51ea27d665 [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810)
- Generalize the hive-sync module for syncing to multiple metastores
- Added new options for datasource
- Added new command line for delta streamer 

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-05 21:34:55 -07:00