lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
董可伦	822230d9ea	[MINOR] Optimize variable names and logs (#4581 )	2022-01-16 16:09:22 +08:00
Sagar Sumit	12e95771ee	[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4574 ) - Move log4j-core to top level pom	2022-01-12 11:53:43 -05:00
董可伦	017ddbbfac	[MINOR] Fix typos (#4567 )	2022-01-11 23:17:10 -08:00
Pratyaksh Sharma	a392e9ba46	[HUDI-485] Corrected the check for incremental sql (#2768 ) * [HUDI-485]: corrected the check for incremental sql * [HUDI-485]: added tests * code review comments addressed * [HUDI-485]: added happy flow test case	2022-01-12 08:22:07 +05:30
YueZhang	cf362fb2d5	[MINOR] Fix some code style issues based on check-style plugin (#4532 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-01-09 01:14:56 -08:00
董可伦	4f6cdd73a3	[HUDI-3192] Spark metastore schema evolution broken (#4533 )	2022-01-08 10:48:37 +08:00
董可伦	b1df60672b	[MINOR] fix typos in DDLExecutor (#4534 )	2022-01-07 07:59:55 -05:00
Danny Chan	0e297c0c4c	[HUDI-3171] Sync empty table to hive metastore (#4511 )	2022-01-05 16:41:33 +08:00
YueZhang	1e2d2c437d	[HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-01-02 22:43:30 -05:00
YueZhang	ef9923fc55	[HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453 ) * constructDropPartitions when drop partitions using jdbc * done * done * code style * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-12-31 15:56:33 +08:00
Shawy Geng	a4e622ac61	[HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173 ) * [HUDI-2154] Add index key field to HoodieKey * [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine. * revert HUDI-2154 add index key field to HoodieKey * fix all comments and introduce a new tricky way to get index key at runtime support double insert for bucket index * revert spark read optimizer based on bucket index * add the storage layout * index tag, hash function and add ut * fix ut * address partial comments * Code review feedback * add layout config and docs * fix ut * rename hoodie.layout and rebase master Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-12-30 12:38:26 -08:00
Udit Mehrotra	9412281cb1	[HUDI-2983] Remove Log4j2 transitive dependencies (#4281 )	2021-12-28 07:15:05 -08:00
ForwardXu	32505d5adb	[HUDI-3106] Fix HiveSyncTool not sync schema (#4452 )	2021-12-27 22:11:14 -08:00
ForwardXu	dd96129191	[HUDI-2990] Sync to HMS when deleting partitions (#4291 )	2021-12-13 20:40:06 +08:00
fengli	568181a3e7	[HUDI-2934] Optimize RequestHandler code style close apache/hudi#4215	2021-12-04 15:30:52 +08:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Nate Radtke	887787e8b9	[HUDI-1932] Update Hive sync timestamp when change detected (#3053 ) * Update Hive sync timestamp when change detected Only update the last commit timestamp on the Hive table when the table schema has changed or a partition is created/updated. When using AWS Glue Data Catalog as the metastore for Hive this will ensure that table versions are substantive (including schema and/or partition changes). Prior to this change when a Hive sync is performed without schema or partition changes the table in the Glue Data Catalog would have a new version published with the only change being the timestamp property. https://issues.apache.org/jira/browse/HUDI-1932 * add conditional sync flag * fix testSyncWithoutDiffs * fix HiveSyncConfig Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2021-11-21 12:11:05 +05:30
xiarixiaoyao	acc40625f5	[HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro table. (#3911 )	2021-11-03 20:36:01 +08:00
Yann Byron	1f17467f73	[HUDI-1869] Upgrading Spark3 To 3.1 (#3844 ) Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>	2021-11-02 18:25:12 -07:00
vinoyang	b1c4acf0ae	[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864 )	2021-10-26 22:36:10 +08:00
vinoyang	220bf6a7e6	[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847 )	2021-10-25 13:45:28 +08:00
董可伦	48a3906ccc	[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764 )	2021-10-11 14:07:34 -04:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Vinay Patil	73e8ba7620	[HUDI-2499] Making jdbc-url, user and pass as non-required field for other sync modes (#3732 )	2021-09-30 11:41:15 -04:00
qianchutao	7e887b54d7	[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721 )	2021-09-26 21:52:35 +08:00
jsbali	f52cb32f5f	[HUDI-2248] Fixing the closing of hms client (#3364 ) * [HUDI-2248] Fixing the closing of hms client * [HUDI-2248] Using Hive.closeCurrent() over client.close()	2021-09-23 13:45:24 -07:00
董可伦	8a652171cf	[MINOR] Fix typo,'compatiblity' corrected to 'compatibility' (#3675 )	2021-09-17 09:43:23 +08:00
Wei	4abcb4f659	[MINOR] Remove unused variables (#3631 )	2021-09-09 23:21:16 +08:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Raymond Xu	8255a86cb4	[HUDI-1939] remove joda time in hivesync module (#3430 )	2021-08-10 20:25:41 -07:00
swuferhong	21db6d7a84	[HUDI-1771] Propagate CDC format for hoodie (#3285 )	2021-08-10 20:23:23 +08:00
pengzhiwei	0dcd6a8fca	[HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387 )	2021-08-05 09:57:22 -04:00
swuferhong	eedfadeb46	[HUDI-2244] Fix database alreadyExists exception while hive sync (#3361 )	2021-07-28 19:40:16 +08:00
Sivabalan Narayanan	61148c1c43	[HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306 )	2021-07-26 17:21:04 -04:00
jsbali	66207ed91a	[HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879 ) * [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool * [HUDI-1848] Fixing test cases * [HUDI-1848] CR changes * [HUDI-1848] Fix checkstyle violations * [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels. * [HUDI-1848] Adding the complex schema and resolving merge conflicts * [HUDI-1848] Adding some more javadocs * [HUDI-1848] Added javadocs for DDLExecutor impls * [HUDI-1848] Fixed style issue	2021-07-23 09:03:15 -07:00
vinoyang	a62a6cff32	[MINOR] Refactor hive sync tool to reduce duplicate code (#3276 ) * [MINOR] Refactor hive sync tool to reduce duplicate code	2021-07-15 23:54:38 +08:00
pengzhiwei	93967404a7	[HUDI-2180] Fix Compile Error For Spark3 (#3274 )	2021-07-14 09:02:28 -07:00
pengzhiwei	ffa934182a	[HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer	2021-07-12 13:03:14 +08:00
vinoth chandar	c50c24908a	[MINOR] Fix build broken from #3186 (#3245 )	2021-07-08 14:23:52 -07:00
xiarixiaoyao	de07e61382	[HUDI-2099]hive lock which state is WATING should be released, otherwise this hive lock will be locked forever (#3186 )	2021-07-08 10:30:48 -04:00
xiarixiaoyao	6a71412f78	[HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem (#3209 )	2021-07-04 22:30:36 +08:00
pengzhiwei	4f215e2938	[HUDI-2057] CTAS Generate An External Table When Create Managed Table (#3146 )	2021-07-03 15:55:36 +08:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
Raymond Xu	0749cc826a	[HUDI-2081] Move schema util tests out from TestHiveSyncTool (#3166 )	2021-06-29 11:23:46 +08:00
n3nash	23dbc09a0d	[MINOR] Removing un-used files and references (#3150 )	2021-06-24 22:17:40 -07:00
s-sanjay	0fb8556b0d	Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542 ) [global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync Co-authored-by: Jagmeet Bali <jsbali@uber.com>	2021-06-24 20:26:26 -07:00
pengzhiwei	ad53cf450e	[HUDI-1879] Fix RO Tables Returning Snapshot Result (#2925 )	2021-06-17 04:18:21 -07:00
Wei	75d663f65d	[HUDI-1980] Optimize the code to prevent other exceptions from causing resources not to be closed (#3038 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-06-08 21:58:34 +08:00
pengzhiwei	f760ec543e	[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645 ) Main functions: Support create table for hoodie. Support CTAS. Support Insert for hoodie. Including dynamic partition and static partition insert. Support MergeInto for hoodie. Support DELETE Support UPDATE Both support spark2 & spark3 based on DataSourceV1. Main changes: Add sql parser for spark2. Add HoodieAnalysis for sql resolve and logical plan rewrite. Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS. In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the HoodieWriteHandler and other related classes. 1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression. 2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into. 3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema. Verify this pull request Add TestCreateTable for test create hoodie tables and CTAS. Add TestInsertTable for test insert hoodie tables. Add TestMergeIntoTable for test merge hoodie tables. Add TestUpdateTable for test update hoodie tables. Add TestDeleteTable for test delete hoodie tables. Add TestSqlStatement for test supported ddl/dml currently.	2021-06-07 23:24:32 -07:00

1 2

81 Commits