1
0
Commit Graph

108 Commits

Author SHA1 Message Date
Carl-Zhou-CN
dee3a14aae [HUDI-2582] Support concurrent key gen for different tables with row writer path (#3817)
Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>
2021-11-02 18:05:09 -04:00
xiarixiaoyao
d194643b49 [HUDI-2101][RFC-28] support z-order for hudi (#3330)
* [HUDI-2101]support z-order for hudi

* Renaming some configs for consistency/simplicity.

* Minor code cleanups

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-11-02 09:31:57 -07:00
Y Ethan Guo
0223c442ec [HUDI-2502] Refactor index in hudi-client module (#3778)
- Refactor Index to reduce Line of Code and re-use across engines.
2021-10-28 04:16:00 -04:00
Yann Byron
1e2be85a0f [HUDI-2482] support 'drop partition' sql (#3754) 2021-10-19 22:09:53 +08:00
Danny Chan
abf3e3fe71 [HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787) 2021-10-14 10:36:18 +08:00
董可伦
10e3a9a3fb [MINOR] Fix typo,'properites' corrected to 'properties' (#3738) 2021-10-06 20:37:01 -04:00
Yann Byron
e91e611afb [HUDI-2456] support 'show partitions' sql (#3693) 2021-10-06 15:46:49 +08:00
Sivabalan Narayanan
5f32162a2f [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. 
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
2021-10-06 00:17:52 -04:00
Danny Chan
5515a0d319 [HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files (#3702) 2021-09-23 15:14:30 +08:00
董可伦
5a94043f38 [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only (#3517) 2021-09-21 22:11:52 +08:00
pengzhiwei
cc5256a7d8 [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS (#3534) 2021-08-26 16:54:41 +08:00
pengzhiwei
49829f8822 [HUDI-2339] Create Table If Not Exists Failed After Alter Table (#3510) 2021-08-20 14:21:10 +08:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
Sagar Sumit
37c29e75dc [HUDI-2322] Use correct meta columns while preparing dataset for bulk insert (#3504) 2021-08-19 12:07:12 -04:00
liujinhui
5ee35a0a92 HUDI-1674 (#3488) 2021-08-18 13:45:48 +08:00
Y Ethan Guo
23dca6c237 [HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470)
- Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.
2021-08-14 20:20:23 -04:00
liujinhui
b7da6cb33d [HUDI-2307] When using delete_partition with ds should not rely on the primary key (#3469)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-08-14 02:53:39 -04:00
Sagar Sumit
9689278014 [HUDI-1363] Provide option to drop partition columns (#3465)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-08-13 13:01:26 -04:00
董可伦
6602e55cd2 [HUDI-2279]Support column name matching for insert * and update set * in merge into (#3415) 2021-08-13 14:10:07 +08:00
Sagar Sumit
0544d70d8f [MINOR] Deprecate older configs (#3464)
Rename and deprecate props in HoodieWriteConfig

Rename and deprecate older props
2021-08-12 20:31:04 -07:00
liujinhui
c0fc9cdaf3 MINOR (#3459)
MOVE hoodie Deltrstreamer to hudi-utilties
2021-08-12 18:19:05 +08:00
Sivabalan Narayanan
c9fa3cffaf [HUDI-1774] Adding support for delete_partitions to spark data source (#3437) 2021-08-11 01:03:01 -04:00
Shawy Geng
a5e496fe23 [HUDI-2292] MOR should not predicate pushdown when reading with payload_combine type (#3443) 2021-08-11 12:17:39 +08:00
Sivabalan Narayanan
1196736185 [HUDI-1129] Improving schema evolution support in hudi (#2927)
* Adding support to ingest records with old schema after table's schema is evolved

* Rebasing against latest master

- Trimming test file to be < 800 lines
- Renaming config names

* Addressing feedback

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-10 09:15:37 -07:00
pengzhiwei
41a9986a76 [HUDI-2208] Support Bulk Insert For Spark Sql (#3328) 2021-08-09 00:18:31 -04:00
pengzhiwei
32a50d8ddb [HUDI-2243] Support Time Travel Query For Hoodie Table (#3360) 2021-08-07 19:07:22 -04:00
pengzhiwei
55d2e786db [HUDI-1842] Spark Sql Support For pre-existing Hoodie Table (#3393) 2021-08-07 07:49:26 -04:00
pengzhiwei
9ce548edb1 [MINOR] fix compile error in compaction command (#3421) 2021-08-06 16:18:19 +08:00
pengzhiwei
3f8ca1a355 [HUDI-2182] Support Compaction Command For Spark Sql (#3277) 2021-08-06 15:12:10 +08:00
pengzhiwei
0dcd6a8fca [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387) 2021-08-05 09:57:22 -04:00
pengzhiwei
5574e092fb [HUDI-2232] [SQL] MERGE INTO fails with table having nested struct (#3379) 2021-08-04 18:20:29 +08:00
wenningd
91bb0d1318 [HUDI-2255] Refactor Datasource options (#3373)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-08-03 17:50:30 -07:00
Udit Mehrotra
1ff2d3459a [HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL (#2893) 2021-08-03 14:47:40 -07:00
Sivabalan Narayanan
fe508376fa [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315) 2021-08-02 09:45:09 -04:00
pengzhiwei
c2370402ea [HUDI-2251] Fix Exception Cause By Table Name Case Sensitivity For Append Mode Write (#3367) 2021-07-29 17:36:56 -04:00
Shawy Geng
44e41dc9bb [HUDI-2117] Unpersist the input rdd after the commit is completed to … (#3207)
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-07-29 08:16:58 -07:00
pengzhiwei
bbadac7de1 [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write (#2296) 2021-07-28 21:30:18 -07:00
pengzhiwei
59ff8423f9 [HUDI-2223] Fix Alter Partitioned Table Failed (#3350) 2021-07-27 20:01:04 +08:00
Gary Li
925873bb3c [HUDI-2217] Fix no value present in incremental query on MOR (#3340) 2021-07-27 17:30:01 +08:00
董可伦
a91296f14a [HUDI-2216] Correct the words fiels in the comments to fields (#3339) 2021-07-25 12:15:57 +08:00
pengzhiwei
2c910ee3af [HUDI-2212] Missing PrimaryKey In Hoodie Properties For CTAS Table (#3332) 2021-07-23 15:21:57 +08:00
pengzhiwei
5a2f3d439e [HUDI-2139] MergeInto MOR Table May Result InCorrect Result (#3230) 2021-07-23 10:19:43 +08:00
pengzhiwei
151f22e43a [HUDI-2195] Sync Hive Failed When Execute CTAS In Spark2 And Spark3 (#3299) 2021-07-22 15:33:38 +08:00
Sivabalan Narayanan
d5026e9a24 [HUDI-2161] Adding support to disable meta columns with bulk insert operation (#3247) 2021-07-19 20:43:48 -04:00
pengzhiwei
572a214412 [HUDI-1884] MergeInto Support Partial Update For COW (#3154) 2021-07-17 12:59:18 +08:00
Jintao Guan
38cd74b563 [MINOR] Allow users to choose ORC as base file format in Spark SQL (#3279) 2021-07-16 12:24:41 +08:00
vinoth chandar
75040ee9e5 [HUDI-2149] Ensure and Audit docs for every configuration class in the codebase (#3272)
- Added docs when missing
 - Rewrote, reworded as needed
 - Made couple more classes extend HoodieConfig
2021-07-14 10:56:08 -07:00
pengzhiwei
f0a2f378ea Merge pull request #3120 from pengzhiwei2018/dev_metasync
[HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
2021-07-13 22:37:20 +08:00
pengzhiwei
ca440ccf88 [HUDI-2107] Support Read Log Only MOR Table For Spark (#3193) 2021-07-12 17:31:23 +08:00
pengzhiwei
ffa934182a [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer 2021-07-12 13:03:14 +08:00