1
0
Commit Graph

161 Commits

Author SHA1 Message Date
vinoyang
b1c4acf0ae [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864) 2021-10-26 22:36:10 +08:00
Yann Byron
1e2be85a0f [HUDI-2482] support 'drop partition' sql (#3754) 2021-10-19 22:09:53 +08:00
Danny Chan
abf3e3fe71 [HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787) 2021-10-14 10:36:18 +08:00
Sivabalan Narayanan
8a487eafa7 [HUDI-2494] Fixing glob pattern to skip all hoodie meta paths (#3768) 2021-10-12 14:06:40 -04:00
董可伦
10e3a9a3fb [MINOR] Fix typo,'properites' corrected to 'properties' (#3738) 2021-10-06 20:37:01 -04:00
Yann Byron
e91e611afb [HUDI-2456] support 'show partitions' sql (#3693) 2021-10-06 15:46:49 +08:00
Sivabalan Narayanan
5f32162a2f [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. 
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
2021-10-06 00:17:52 -04:00
董可伦
2f07e1267f [MINOR] Fix typo Hooodie corrected to Hoodie & reuqired corrected to required (#3730) 2021-09-30 09:55:32 +08:00
Sagar Sumit
bc4966ea73 [HUDI-2484] Fix hive sync mode setting in Deltastreamer (#3712) 2021-09-24 13:05:42 -04:00
Danny Chan
5515a0d319 [HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files (#3702) 2021-09-23 15:14:30 +08:00
董可伦
5a94043f38 [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only (#3517) 2021-09-21 22:11:52 +08:00
liujinhui
76554aa31a [MINOR] Add document for DataSourceReadOptions (#3653) 2021-09-15 14:33:43 +08:00
liujinhui
9f3c4a2a7f [HUDI-2410] Fix getDefaultBootstrapIndexClass logical error (#3633) 2021-09-13 16:10:17 +08:00
vinoth chandar
ea59a7ff5f [HUDI-2080] Move to ubuntu-18.04 for Azure CI (#3409)
Update Azure CI ubuntu from 16.04 to 18.04 due to 16.04 will be removed soon

Fixed some consistently failed tests

* fix TestCOWDataSourceStorage TestMORDataSourceStorage
* reset mocks

Also update readme badge



Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2021-09-07 09:44:30 -07:00
wenningd
69cbcc9516 Merge pull request #3541 from rahil-c/rahil-c/HUDI-2359
[HUDI-2359] Add basic "hoodie_is_deleted" unit tests to TestDataSource classes
2021-08-27 16:28:51 -07:00
Satish M
55a80a817d [HUDI-2264] Refactor HoodieSparkSqlWriterSuite to add setup and teardown (#3544) 2021-08-26 10:01:48 -04:00
pengzhiwei
cc5256a7d8 [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS (#3534) 2021-08-26 16:54:41 +08:00
Rahil Chertara
694300477f [HUDI-2359] Add basic "hoodie_is_deleted" unit tests to TestDataSource classes 2021-08-25 16:35:35 -07:00
zhangyue19921010
de94787a85 [HUDI-2345] Hoodie columns sort partitioner for bulk insert (#3523)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-08-24 21:45:17 +08:00
董可伦
be8c1e499f Support referencing subquery with column aliases by table alias in merge into (#3380) 2021-08-21 21:53:16 +08:00
Udit Mehrotra
e39d0a2f28 Keep non-conflicting names for common configs between DataSourceOptions and HoodieWriteConfig (#3511) 2021-08-20 02:42:59 -07:00
pengzhiwei
49829f8822 [HUDI-2339] Create Table If Not Exists Failed After Alter Table (#3510) 2021-08-20 14:21:10 +08:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
Sagar Sumit
37c29e75dc [HUDI-2322] Use correct meta columns while preparing dataset for bulk insert (#3504) 2021-08-19 12:07:12 -04:00
liujinhui
5ee35a0a92 HUDI-1674 (#3488) 2021-08-18 13:45:48 +08:00
Raymond Xu
4d508ef673 [MINOR] Fix SelectPackages in HoodieSparkFunctionalTestSuite (#3476) 2021-08-15 10:17:00 -07:00
Udit Mehrotra
3e301196bf Moving to 0.10.0-SNAPSHOT on master branch. 2021-08-14 18:51:09 -07:00
Y Ethan Guo
23dca6c237 [HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470)
- Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.
2021-08-14 20:20:23 -04:00
vinoth chandar
18e6b79947 [MINOR] Adding back all old default val members to DataSourceOptions (#3474)
- Added @Deprecated
 - Added @deprecated javadoc to keys and defaults suggested how to migrate
 - Moved all deprecated members to bottom to improve readability
2021-08-14 14:49:22 -07:00
liujinhui
b7da6cb33d [HUDI-2307] When using delete_partition with ds should not rely on the primary key (#3469)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-08-14 02:53:39 -04:00
Sivabalan Narayanan
642b1b671d [HUDI-2151] Flipping defaults (#3452) 2021-08-13 19:29:22 -04:00
Sagar Sumit
9689278014 [HUDI-1363] Provide option to drop partition columns (#3465)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-08-13 13:01:26 -04:00
董可伦
6602e55cd2 [HUDI-2279]Support column name matching for insert * and update set * in merge into (#3415) 2021-08-13 14:10:07 +08:00
Sagar Sumit
0544d70d8f [MINOR] Deprecate older configs (#3464)
Rename and deprecate props in HoodieWriteConfig

Rename and deprecate older props
2021-08-12 20:31:04 -07:00
liujinhui
c0fc9cdaf3 MINOR (#3459)
MOVE hoodie Deltrstreamer to hudi-utilties
2021-08-12 18:19:05 +08:00
Sivabalan Narayanan
c9fa3cffaf [HUDI-1774] Adding support for delete_partitions to spark data source (#3437) 2021-08-11 01:03:01 -04:00
Shawy Geng
a5e496fe23 [HUDI-2292] MOR should not predicate pushdown when reading with payload_combine type (#3443) 2021-08-11 12:17:39 +08:00
swuferhong
5448cdde7e [HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload (#3401) 2021-08-11 10:20:55 +08:00
Sivabalan Narayanan
1196736185 [HUDI-1129] Improving schema evolution support in hudi (#2927)
* Adding support to ingest records with old schema after table's schema is evolved

* Rebasing against latest master

- Trimming test file to be < 800 lines
- Renaming config names

* Addressing feedback

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-10 09:15:37 -07:00
zhangyue19921010
73d898322b [MINOR] Fix travis from errors (#3432) 2021-08-10 08:25:49 -07:00
pengzhiwei
41a9986a76 [HUDI-2208] Support Bulk Insert For Spark Sql (#3328) 2021-08-09 00:18:31 -04:00
pengzhiwei
32a50d8ddb [HUDI-2243] Support Time Travel Query For Hoodie Table (#3360) 2021-08-07 19:07:22 -04:00
pengzhiwei
55d2e786db [HUDI-1842] Spark Sql Support For pre-existing Hoodie Table (#3393) 2021-08-07 07:49:26 -04:00
pengzhiwei
9ce548edb1 [MINOR] fix compile error in compaction command (#3421) 2021-08-06 16:18:19 +08:00
pengzhiwei
3f8ca1a355 [HUDI-2182] Support Compaction Command For Spark Sql (#3277) 2021-08-06 15:12:10 +08:00
pengzhiwei
0dcd6a8fca [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387) 2021-08-05 09:57:22 -04:00
Sivabalan Narayanan
1df5ded433 [HUDI-2273] Migrating some long running tests to functional test profile (#3398) 2021-08-04 19:08:50 -04:00
pengzhiwei
5574e092fb [HUDI-2232] [SQL] MERGE INTO fails with table having nested struct (#3379) 2021-08-04 18:20:29 +08:00
wenningd
91bb0d1318 [HUDI-2255] Refactor Datasource options (#3373)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-08-03 17:50:30 -07:00
Udit Mehrotra
1ff2d3459a [HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL (#2893) 2021-08-03 14:47:40 -07:00