lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
Raymond Xu	5e65aefc61	[HUDI-3837] Fix license and rat check settings (#5273 ) - add missing licenses - fix CI setting to run rat plugin - fix deploy script to include integ test modules	2022-04-09 11:01:18 -07:00
Raymond Xu	e96f08f355	Moving to 0.12.0-SNAPSHOT on master branch.	2022-04-06 15:24:10 +08:00
ForwardXu	3449e86989	[HUDI-3780] improve drop partitions (#5178 )	2022-04-05 11:52:33 +08:00
Vinoth Govindarajan	20964df770	[HUDI-3357] MVP implementation of BigQuerySyncTool (#5125 ) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-02 13:18:06 -07:00
todd5167	eef3f9c74a	[HUDI-3771] flink supports sync table information to aws glue (#5202 )	2022-04-02 21:16:10 +08:00
codejoyan	51a701cef1	[HUDI-3020] Utility to create manifest file (#5153 ) Co-authored-by: joyan <joyan.sil@walmart.com>	2022-03-31 07:22:03 -07:00
Raymond Xu	31d4a16deb	[HUDI-3536] Add hudi-datahub-sync implementation (#5155 )	2022-03-30 14:38:02 -07:00
ForwardXu	0802510ca9	[HUDI-2520] Fix drop partition issue when sync to hive (#5147 )	2022-03-29 11:28:19 -07:00
Raymond Xu	6ccbae4d2a	[HUDI-2757] Implement Hudi AWS Glue sync (#5076 )	2022-03-28 14:54:59 -04:00
Raymond Xu	686da41696	[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120 )	2022-03-24 09:10:33 -07:00
Rajesh Mahindra	5f570ea151	[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175 ) - Refactor hive sync tool / config to use reflection and standardize configs Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-03-21 22:56:31 -04:00
MrSleeping123	8859b48b2a	[HUDI-3383] Sync column comments while syncing a hive table (#4960 ) Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.	2022-03-10 09:44:39 +08:00
Yann Byron	2fe7a3a41f	[HUDI-2610] pass the spark version when sync the table created by spark (#4758 ) * [HUDI-2610] pass the spark version when sync the table created by spark * [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig	2022-02-10 21:05:28 +05:30
ForwardXu	773b317983	[HUDI-2941] Show _hoodie_operation in spark sql results (#4649 )	2022-02-07 06:28:13 -08:00
ehui	538db185ca	[HUDI-2491] Expose HMS mode metastore uri config option for spark writer (#3962 )	2022-02-07 18:13:51 +05:30
Alexey Kudinkin	a68e1dc2db	[HUDI-431] Adding support for Parquet in MOR `LogBlock`s (#4333 ) - Adding support for Parquet in MOR tables Log blocks Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2022-02-02 14:35:05 -05:00
董可伦	822230d9ea	[MINOR] Optimize variable names and logs (#4581 )	2022-01-16 16:09:22 +08:00
Sagar Sumit	12e95771ee	[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4574 ) - Move log4j-core to top level pom	2022-01-12 11:53:43 -05:00
董可伦	017ddbbfac	[MINOR] Fix typos (#4567 )	2022-01-11 23:17:10 -08:00
Pratyaksh Sharma	a392e9ba46	[HUDI-485] Corrected the check for incremental sql (#2768 ) * [HUDI-485]: corrected the check for incremental sql * [HUDI-485]: added tests * code review comments addressed * [HUDI-485]: added happy flow test case	2022-01-12 08:22:07 +05:30
YueZhang	cf362fb2d5	[MINOR] Fix some code style issues based on check-style plugin (#4532 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-01-09 01:14:56 -08:00
董可伦	4f6cdd73a3	[HUDI-3192] Spark metastore schema evolution broken (#4533 )	2022-01-08 10:48:37 +08:00
董可伦	b1df60672b	[MINOR] fix typos in DDLExecutor (#4534 )	2022-01-07 07:59:55 -05:00
Danny Chan	0e297c0c4c	[HUDI-3171] Sync empty table to hive metastore (#4511 )	2022-01-05 16:41:33 +08:00
YueZhang	1e2d2c437d	[HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-01-02 22:43:30 -05:00
YueZhang	ef9923fc55	[HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453 ) * constructDropPartitions when drop partitions using jdbc * done * done * code style * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-12-31 15:56:33 +08:00
Shawy Geng	a4e622ac61	[HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173 ) * [HUDI-2154] Add index key field to HoodieKey * [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine. * revert HUDI-2154 add index key field to HoodieKey * fix all comments and introduce a new tricky way to get index key at runtime support double insert for bucket index * revert spark read optimizer based on bucket index * add the storage layout * index tag, hash function and add ut * fix ut * address partial comments * Code review feedback * add layout config and docs * fix ut * rename hoodie.layout and rebase master Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-12-30 12:38:26 -08:00
Udit Mehrotra	9412281cb1	[HUDI-2983] Remove Log4j2 transitive dependencies (#4281 )	2021-12-28 07:15:05 -08:00
ForwardXu	32505d5adb	[HUDI-3106] Fix HiveSyncTool not sync schema (#4452 )	2021-12-27 22:11:14 -08:00
ForwardXu	dbec6c512b	[HUDI-3022] Fix NPE for isDropPartition method (#4319 ) * [HUDI-3022] Fix NPE for isDropPartition method	2021-12-15 19:38:02 +08:00
ForwardXu	dd96129191	[HUDI-2990] Sync to HMS when deleting partitions (#4291 )	2021-12-13 20:40:06 +08:00
fengli	568181a3e7	[HUDI-2934] Optimize RequestHandler code style close apache/hudi#4215	2021-12-04 15:30:52 +08:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Nate Radtke	887787e8b9	[HUDI-1932] Update Hive sync timestamp when change detected (#3053 ) * Update Hive sync timestamp when change detected Only update the last commit timestamp on the Hive table when the table schema has changed or a partition is created/updated. When using AWS Glue Data Catalog as the metastore for Hive this will ensure that table versions are substantive (including schema and/or partition changes). Prior to this change when a Hive sync is performed without schema or partition changes the table in the Glue Data Catalog would have a new version published with the only change being the timestamp property. https://issues.apache.org/jira/browse/HUDI-1932 * add conditional sync flag * fix testSyncWithoutDiffs * fix HiveSyncConfig Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2021-11-21 12:11:05 +05:30
xiarixiaoyao	acc40625f5	[HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro table. (#3911 )	2021-11-03 20:36:01 +08:00
Yann Byron	1f17467f73	[HUDI-1869] Upgrading Spark3 To 3.1 (#3844 ) Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>	2021-11-02 18:25:12 -07:00
Sivabalan Narayanan	f9bc3e03e5	[MINOR] Adding a deprecated constructor to AbstractSyncHoodieClient (#3902 )	2021-11-02 12:16:38 -04:00
vinoyang	b1c4acf0ae	[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864 )	2021-10-26 22:36:10 +08:00
vinoyang	220bf6a7e6	[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847 )	2021-10-25 13:45:28 +08:00
董可伦	48a3906ccc	[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764 )	2021-10-11 14:07:34 -04:00
董可伦	10e3a9a3fb	[MINOR] Fix typo,'properites' corrected to 'properties' (#3738 )	2021-10-06 20:37:01 -04:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Vinay Patil	73e8ba7620	[HUDI-2499] Making jdbc-url, user and pass as non-required field for other sync modes (#3732 )	2021-09-30 11:41:15 -04:00
qianchutao	7e887b54d7	[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721 )	2021-09-26 21:52:35 +08:00
jsbali	f52cb32f5f	[HUDI-2248] Fixing the closing of hms client (#3364 ) * [HUDI-2248] Fixing the closing of hms client * [HUDI-2248] Using Hive.closeCurrent() over client.close()	2021-09-23 13:45:24 -07:00
董可伦	3a150ee181	[HUDI-2447] Extract common business logic & Fix typo (#3683 )	2021-09-17 19:45:22 +08:00
董可伦	8a652171cf	[MINOR] Fix typo,'compatiblity' corrected to 'compatibility' (#3675 )	2021-09-17 09:43:23 +08:00
Wei	4abcb4f659	[MINOR] Remove unused variables (#3631 )	2021-09-09 23:21:16 +08:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00

1 2 3

108 Commits