1
0
Commit Graph

244 Commits

Author SHA1 Message Date
jsbali
7ce0f4522b [HUDI-2711] Fallback to fulltable scan for IncrementalRelation if underlying files have been cleared or moved by cleaner (#3946)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-01-31 23:03:18 -05:00
Yann Byron
ecbad9526a [HUDI-3253] preferred to use the table's own location (#4608) 2022-01-29 00:39:42 -08:00
Raymond Xu
0bd38f26ca [HUDI-2596] Make class names consistent in hudi-client (#4680) 2022-01-27 17:05:08 -08:00
Yann Byron
4a9f826382 [HUDI-3215] Solve UT for Spark 3.2 (#4565) 2022-01-26 14:48:26 -08:00
xuzifu666
bf409e8423 [MINOR] Standardize HoodieSqlCommon.g4 file (#4582) 2022-01-25 10:09:08 +08:00
Yann Byron
26c3f797b0 [HUDI-3237] gracefully fail to change column data type (#4677) 2022-01-24 16:33:36 -08:00
Alexey Kudinkin
bc7882cbe9 [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering (#4606)
Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
2022-01-24 16:53:54 -05:00
董可伦
cfde45b548 [HUDI-3282] Fix delete exception for Spark SQL when sync Hive (#4644) 2022-01-23 11:32:57 -08:00
Sivabalan Narayanan
f7a77961e3 [HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write (#2903) 2022-01-23 14:23:21 -05:00
董可伦
56cd8ffae0 [HUDI-2837] Add support for using database name in incremental query (#4083) 2022-01-22 22:11:27 -08:00
Y Ethan Guo
4b9085057a [HUDI-3268] Fix NPE while reading table with Spark datasource (#4630) 2022-01-21 08:46:07 -05:00
董可伦
8547f11752 [HUDI-3271] Code optimization and clean up unused code in HoodieSparkSqlWriter (#4631) 2022-01-20 18:49:04 -05:00
Yann Byron
31b57a256f [HUDI-3236] use fields'comments persisted in catalog to fill in schema (#4587) 2022-01-19 21:44:35 -08:00
Alexey Kudinkin
4bea758738 [HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex (#4531) 2022-01-18 14:54:51 -08:00
Thinking Chen
caeea946fb [HUDI-3245] Convert uppercase letters to lowercase in storage configs (#4602) 2022-01-18 14:51:09 -05:00
Yann Byron
a09c231911 [HUDI-2903] get table schema from the last commit with data written (#4180) 2022-01-18 10:50:30 -05:00
RexAn
f18447406d [HUDI-1558] Struct Stream Source Support Spark3 (#4586)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-01-18 11:08:33 +08:00
Alexey Kudinkin
75caa7d3d8 [HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines (#4520) 2022-01-16 22:46:20 -08:00
Yann Byron
d2dda55794 [HUDI-2968] add UT for update/delete on non-pk condition (#4568) 2022-01-16 12:02:12 -08:00
Yann Byron
5e0171a5ee [HUDI-3198] Improve Spark SQL create table from existing hudi table (#4584)
To modify SQL statement for creating hudi table based on an existing hudi path.

From:

```sql
create table hudi_tbl using hudi tblproperties (primaryKey='id', preCombineField='ts', type='cow') partitioned by (pt) location '/path/to/hudi'
```

To:
```sql
create table hudi_tbl using hudi location '/path/to/hudi'
```
2022-01-14 10:15:29 -08:00
leesf
5ce45c440b [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation (#4514)
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
董可伦
017ddbbfac [MINOR] Fix typos (#4567) 2022-01-11 23:17:10 -08:00
Yann Byron
36790709f7 [HUDI-3125] spark-sql write timestamp directly (#4471) 2022-01-08 23:43:25 -08:00
Sagar Sumit
827549949c [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
Raymond Xu
2467c137e4 [HUDI-3100] Add config for hive conditional sync (#4440) 2022-01-06 23:26:35 -08:00
Vinish Reddy
eee715b3ff [HUDI-3168] Fixing null schema with empty commit in incremental relation (#4513) 2022-01-05 11:43:10 -05:00
Sivabalan Narayanan
7329d229d5 Adding tests to validate different key generators (#4473) 2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad [HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 (#4498) 2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9 [HUDI-2558] Fixing Clustering w/ sort columns with null values fails (#4404) 2022-01-03 12:19:43 +05:30
Yann Byron
fe9406dd33 [HUDI-3131] fix ctas error in spark3.1.1 (#4476) 2022-01-02 03:06:55 -08:00
Yann Byron
1622b52c9c [HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (#4490) 2022-01-02 02:42:10 -08:00
Shawy Geng
a4e622ac61 [HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173)
* [HUDI-2154] Add index key field to HoodieKey

* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-12-30 12:38:26 -08:00
ForwardXu
504747ecf4 [HUDI-3108] Fix Purge Drop MOR Table Cause error (#4455) 2021-12-29 20:23:23 +08:00
Yann Byron
05942e018c [HUDI-2811] Support Spark 3.2 (#4270) 2021-12-28 00:12:44 -08:00
Yann Byron
1f7afba5e4 [HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator (#4416) 2021-12-27 21:39:52 -08:00
ForwardXu
282aa68552 [HUDI-3099] Purge drop partition for spark sql (#4436) 2021-12-28 09:38:26 +08:00
xuzifu666
032b883bd1 [HUDI-3014] Add table option to set utc timezone (#4306) 2021-12-23 16:27:45 +08:00
ForwardXu
5d93edc539 [HUDI-3060] drop table for spark sql (#4364) 2021-12-22 19:17:43 +08:00
harshal patil
7d046f914a [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields 2021-12-21 11:54:52 +05:30
xuzifu666
3ca92108b2 remove unused import (#4349) 2021-12-20 16:32:41 +08:00
Sivabalan Narayanan
03f71ef1a2 [HUDI-2970] Adding tests for archival of replace commit actions (#4268) 2021-12-18 23:59:39 -08:00
xiarixiaoyao
9246b16492 [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType (#4253) 2021-12-17 08:58:02 -05:00
xiarixiaoyao
294d712948 [HUDI-3001] Clean up the marker directory when finish bootstrap operation. (#4298) 2021-12-16 12:36:01 -08:00
ForwardXu
dd96129191 [HUDI-2990] Sync to HMS when deleting partitions (#4291) 2021-12-13 20:40:06 +08:00
Alexey Kudinkin
2d864f7524 [HUDI-2814] Make Z-index more generic Column-Stats Index (#4106) 2021-12-10 14:56:09 -08:00
xiarixiaoyao
68f8597b12 [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished. (#4265)
* [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished.
2021-12-09 19:51:49 +08:00
Yann Byron
2f96f4300b Revert "[HUDI-2495] Resolve inconsistent key generation for timestamp types by GenericRecord and Row (#3944)" (#4201) 2021-12-03 11:13:38 -05:00
Alexey Kudinkin
bed7f9897a [HUDI-2911] Removing default value for PARTITIONPATH_FIELD_NAME resulting in incorrect KeyGenerator configuration (#4195) 2021-12-03 07:33:38 -05:00
Yann Byron
ca427240c0 [MINOR] use catalog schema if can not find table schema (#4182) 2021-12-03 00:37:13 -08:00
zzzhy
61a03bc072 [MINOR] Fix the wrong usage of timestamp length variable bug (#4179)
Signed-off-by: zzzhy <candle_1667@163.com>
2021-12-02 22:47:31 +08:00