lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ForwardXu	dbec6c512b	[HUDI-3022] Fix NPE for isDropPartition method (#4319 ) * [HUDI-3022] Fix NPE for isDropPartition method	2021-12-15 19:38:02 +08:00
ForwardXu	dd96129191	[HUDI-2990] Sync to HMS when deleting partitions (#4291 )	2021-12-13 20:40:06 +08:00
fengli	568181a3e7	[HUDI-2934] Optimize RequestHandler code style close apache/hudi#4215	2021-12-04 15:30:52 +08:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Nate Radtke	887787e8b9	[HUDI-1932] Update Hive sync timestamp when change detected (#3053 ) * Update Hive sync timestamp when change detected Only update the last commit timestamp on the Hive table when the table schema has changed or a partition is created/updated. When using AWS Glue Data Catalog as the metastore for Hive this will ensure that table versions are substantive (including schema and/or partition changes). Prior to this change when a Hive sync is performed without schema or partition changes the table in the Glue Data Catalog would have a new version published with the only change being the timestamp property. https://issues.apache.org/jira/browse/HUDI-1932 * add conditional sync flag * fix testSyncWithoutDiffs * fix HiveSyncConfig Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2021-11-21 12:11:05 +05:30
xiarixiaoyao	acc40625f5	[HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro table. (#3911 )	2021-11-03 20:36:01 +08:00
Yann Byron	1f17467f73	[HUDI-1869] Upgrading Spark3 To 3.1 (#3844 ) Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>	2021-11-02 18:25:12 -07:00
Sivabalan Narayanan	f9bc3e03e5	[MINOR] Adding a deprecated constructor to AbstractSyncHoodieClient (#3902 )	2021-11-02 12:16:38 -04:00
vinoyang	b1c4acf0ae	[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864 )	2021-10-26 22:36:10 +08:00
vinoyang	220bf6a7e6	[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847 )	2021-10-25 13:45:28 +08:00
董可伦	48a3906ccc	[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764 )	2021-10-11 14:07:34 -04:00
董可伦	10e3a9a3fb	[MINOR] Fix typo,'properites' corrected to 'properties' (#3738 )	2021-10-06 20:37:01 -04:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Vinay Patil	73e8ba7620	[HUDI-2499] Making jdbc-url, user and pass as non-required field for other sync modes (#3732 )	2021-09-30 11:41:15 -04:00
qianchutao	7e887b54d7	[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721 )	2021-09-26 21:52:35 +08:00
jsbali	f52cb32f5f	[HUDI-2248] Fixing the closing of hms client (#3364 ) * [HUDI-2248] Fixing the closing of hms client * [HUDI-2248] Using Hive.closeCurrent() over client.close()	2021-09-23 13:45:24 -07:00
董可伦	3a150ee181	[HUDI-2447] Extract common business logic & Fix typo (#3683 )	2021-09-17 19:45:22 +08:00
董可伦	8a652171cf	[MINOR] Fix typo,'compatiblity' corrected to 'compatibility' (#3675 )	2021-09-17 09:43:23 +08:00
Wei	4abcb4f659	[MINOR] Remove unused variables (#3631 )	2021-09-09 23:21:16 +08:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Raymond Xu	8255a86cb4	[HUDI-1939] remove joda time in hivesync module (#3430 )	2021-08-10 20:25:41 -07:00
swuferhong	21db6d7a84	[HUDI-1771] Propagate CDC format for hoodie (#3285 )	2021-08-10 20:23:23 +08:00
pengzhiwei	0dcd6a8fca	[HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387 )	2021-08-05 09:57:22 -04:00
swuferhong	eedfadeb46	[HUDI-2244] Fix database alreadyExists exception while hive sync (#3361 )	2021-07-28 19:40:16 +08:00
Sivabalan Narayanan	61148c1c43	[HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306 )	2021-07-26 17:21:04 -04:00
jsbali	66207ed91a	[HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879 ) * [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool * [HUDI-1848] Fixing test cases * [HUDI-1848] CR changes * [HUDI-1848] Fix checkstyle violations * [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels. * [HUDI-1848] Adding the complex schema and resolving merge conflicts * [HUDI-1848] Adding some more javadocs * [HUDI-1848] Added javadocs for DDLExecutor impls * [HUDI-1848] Fixed style issue	2021-07-23 09:03:15 -07:00
vinoyang	a62a6cff32	[MINOR] Refactor hive sync tool to reduce duplicate code (#3276 ) * [MINOR] Refactor hive sync tool to reduce duplicate code	2021-07-15 23:54:38 +08:00
pengzhiwei	93967404a7	[HUDI-2180] Fix Compile Error For Spark3 (#3274 )	2021-07-14 09:02:28 -07:00
pengzhiwei	ffa934182a	[HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer	2021-07-12 13:03:14 +08:00
vinoth chandar	c50c24908a	[MINOR] Fix build broken from #3186 (#3245 )	2021-07-08 14:23:52 -07:00
xiarixiaoyao	de07e61382	[HUDI-2099]hive lock which state is WATING should be released, otherwise this hive lock will be locked forever (#3186 )	2021-07-08 10:30:48 -04:00
xiarixiaoyao	6a71412f78	[HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem (#3209 )	2021-07-04 22:30:36 +08:00
pengzhiwei	4f215e2938	[HUDI-2057] CTAS Generate An External Table When Create Managed Table (#3146 )	2021-07-03 15:55:36 +08:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
Raymond Xu	0749cc826a	[HUDI-2081] Move schema util tests out from TestHiveSyncTool (#3166 )	2021-06-29 11:23:46 +08:00
n3nash	23dbc09a0d	[MINOR] Removing un-used files and references (#3150 )	2021-06-24 22:17:40 -07:00
s-sanjay	0fb8556b0d	Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542 ) [global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync Co-authored-by: Jagmeet Bali <jsbali@uber.com>	2021-06-24 20:26:26 -07:00
pengzhiwei	ad53cf450e	[HUDI-1879] Fix RO Tables Returning Snapshot Result (#2925 )	2021-06-17 04:18:21 -07:00
Wei	75d663f65d	[HUDI-1980] Optimize the code to prevent other exceptions from causing resources not to be closed (#3038 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-06-08 21:58:34 +08:00
pengzhiwei	f760ec543e	[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645 ) Main functions: Support create table for hoodie. Support CTAS. Support Insert for hoodie. Including dynamic partition and static partition insert. Support MergeInto for hoodie. Support DELETE Support UPDATE Both support spark2 & spark3 based on DataSourceV1. Main changes: Add sql parser for spark2. Add HoodieAnalysis for sql resolve and logical plan rewrite. Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS. In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the HoodieWriteHandler and other related classes. 1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression. 2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into. 3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema. Verify this pull request Add TestCreateTable for test create hoodie tables and CTAS. Add TestInsertTable for test insert hoodie tables. Add TestMergeIntoTable for test merge hoodie tables. Add TestUpdateTable for test update hoodie tables. Add TestDeleteTable for test delete hoodie tables. Add TestSqlStatement for test supported ddl/dml currently.	2021-06-07 23:24:32 -07:00
Raymond Xu	441076b2cc	[HUDI-1950] Move TestHiveMetastoreBasedLockProvider to functional (#3043 ) HiveTestUtil static setup mini servers caused connection refused issue in Azure CI environment, as TestHiveSyncTool and TestHiveMetastoreBasedLockProvider share the same test facilities. Moving TestHiveMetastoreBasedLockProvider (the easier one) to functional test with a separate and improved mini server setup resolved the issue. Also cleaned up dfs cluster from HiveTestUtil. The next step is to move TestHiveSyncTool to functional as well.	2021-06-07 15:38:59 -07:00
Wei	dab13f7473	[HUDI-1979] Optimize logic to improve code readability (#3037 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-06-05 19:40:45 +08:00
vinoth chandar	d02c0e5387	[MINOR] Resolve build issue arising from inaccessible pentaho jar (#3034 ) - Fixes #160 #2479	2021-06-04 15:28:44 -04:00
Volodymyr Burenin	8a48d16e41	[HUDI-1707] Reduces log level for too verbose messages from info to debug level. (#2714 ) * Reduces log level for too verbose messages from info to debug level. * Sort config output. * Code Review : Small restructuring + rebasing to master - Fixing flaky multi delta streamer test - Using isDebugEnabled() checks - Some changes to shorten log message without moving to DEBUG Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-10 07:16:02 -07:00
li36909	2c5a661a64	[HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false (#2759 ) * [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false * Fix review comment	2021-05-07 15:30:26 -07:00
pengzhiwei	c9bcb5e33f	[HUDI-1845] Exception Throws When Sync Non-Partitioned Table To Hive With MultiPartKeysValueExtractor (#2876 )	2021-04-28 19:11:46 -07:00
Roc Marshal	e4fd195d9f	[MINOR] Refactor method up to parent-class (#2822 )	2021-04-27 21:32:32 +08:00
pengzhiwei	aacb8be521	[HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283 )	2021-04-20 14:21:38 -07:00
Roc Marshal	f7b6b68063	[MINOR][hudi-sync] Fix typos (#2844 )	2021-04-19 16:27:13 +08:00

1 2

78 Commits