Yann Byron
4a9f826382
[HUDI-3215] Solve UT for Spark 3.2 ( #4565 )
2022-01-26 14:48:26 -08:00
xuzifu666
bf409e8423
[MINOR] Standardize HoodieSqlCommon.g4 file ( #4582 )
2022-01-25 10:09:08 +08:00
Yann Byron
26c3f797b0
[HUDI-3237] gracefully fail to change column data type ( #4677 )
2022-01-24 16:33:36 -08:00
Alexey Kudinkin
bc7882cbe9
[HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering ( #4606 )
...
Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
2022-01-24 16:53:54 -05:00
董可伦
cfde45b548
[HUDI-3282] Fix delete exception for Spark SQL when sync Hive ( #4644 )
2022-01-23 11:32:57 -08:00
Sivabalan Narayanan
f7a77961e3
[HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write ( #2903 )
2022-01-23 14:23:21 -05:00
董可伦
56cd8ffae0
[HUDI-2837] Add support for using database name in incremental query ( #4083 )
2022-01-22 22:11:27 -08:00
Y Ethan Guo
4b9085057a
[HUDI-3268] Fix NPE while reading table with Spark datasource ( #4630 )
2022-01-21 08:46:07 -05:00
董可伦
8547f11752
[HUDI-3271] Code optimization and clean up unused code in HoodieSparkSqlWriter ( #4631 )
2022-01-20 18:49:04 -05:00
Yann Byron
31b57a256f
[HUDI-3236] use fields'comments persisted in catalog to fill in schema ( #4587 )
2022-01-19 21:44:35 -08:00
Alexey Kudinkin
4bea758738
[HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex ( #4531 )
2022-01-18 14:54:51 -08:00
Thinking Chen
caeea946fb
[HUDI-3245] Convert uppercase letters to lowercase in storage configs ( #4602 )
2022-01-18 14:51:09 -05:00
Yann Byron
a09c231911
[HUDI-2903] get table schema from the last commit with data written ( #4180 )
2022-01-18 10:50:30 -05:00
RexAn
f18447406d
[HUDI-1558] Struct Stream Source Support Spark3 ( #4586 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-01-18 11:08:33 +08:00
Alexey Kudinkin
75caa7d3d8
[HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines ( #4520 )
2022-01-16 22:46:20 -08:00
Yann Byron
d2dda55794
[HUDI-2968] add UT for update/delete on non-pk condition ( #4568 )
2022-01-16 12:02:12 -08:00
Yann Byron
5e0171a5ee
[HUDI-3198] Improve Spark SQL create table from existing hudi table ( #4584 )
...
To modify SQL statement for creating hudi table based on an existing hudi path.
From:
```sql
create table hudi_tbl using hudi tblproperties (primaryKey='id', preCombineField='ts', type='cow') partitioned by (pt) location '/path/to/hudi'
```
To:
```sql
create table hudi_tbl using hudi location '/path/to/hudi'
```
2022-01-14 10:15:29 -08:00
leesf
5ce45c440b
[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation ( #4514 )
...
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
董可伦
017ddbbfac
[MINOR] Fix typos ( #4567 )
2022-01-11 23:17:10 -08:00
Yann Byron
36790709f7
[HUDI-3125] spark-sql write timestamp directly ( #4471 )
2022-01-08 23:43:25 -08:00
Sagar Sumit
827549949c
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator ( #4203 )
...
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
Raymond Xu
2467c137e4
[HUDI-3100] Add config for hive conditional sync ( #4440 )
2022-01-06 23:26:35 -08:00
Vinish Reddy
eee715b3ff
[HUDI-3168] Fixing null schema with empty commit in incremental relation ( #4513 )
2022-01-05 11:43:10 -05:00
Sivabalan Narayanan
7329d229d5
Adding tests to validate different key generators ( #4473 )
2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad
[HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 ( #4498 )
2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9
[HUDI-2558] Fixing Clustering w/ sort columns with null values fails ( #4404 )
2022-01-03 12:19:43 +05:30
Yann Byron
fe9406dd33
[HUDI-3131] fix ctas error in spark3.1.1 ( #4476 )
2022-01-02 03:06:55 -08:00
Yann Byron
1622b52c9c
[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 ( #4490 )
2022-01-02 02:42:10 -08:00
Shawy Geng
a4e622ac61
[HUDI-1951] Add bucket hash index, compatible with the hive bucket ( #3173 )
...
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-12-30 12:38:26 -08:00
ForwardXu
504747ecf4
[HUDI-3108] Fix Purge Drop MOR Table Cause error ( #4455 )
2021-12-29 20:23:23 +08:00
Yann Byron
05942e018c
[HUDI-2811] Support Spark 3.2 ( #4270 )
2021-12-28 00:12:44 -08:00
Yann Byron
1f7afba5e4
[HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator ( #4416 )
2021-12-27 21:39:52 -08:00
ForwardXu
282aa68552
[HUDI-3099] Purge drop partition for spark sql ( #4436 )
2021-12-28 09:38:26 +08:00
xuzifu666
032b883bd1
[HUDI-3014] Add table option to set utc timezone ( #4306 )
2021-12-23 16:27:45 +08:00
ForwardXu
5d93edc539
[HUDI-3060] drop table for spark sql ( #4364 )
2021-12-22 19:17:43 +08:00
harshal patil
7d046f914a
[HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
2021-12-21 11:54:52 +05:30
xuzifu666
3ca92108b2
remove unused import ( #4349 )
2021-12-20 16:32:41 +08:00
Sivabalan Narayanan
03f71ef1a2
[HUDI-2970] Adding tests for archival of replace commit actions ( #4268 )
2021-12-18 23:59:39 -08:00
xiarixiaoyao
9246b16492
[HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType ( #4253 )
2021-12-17 08:58:02 -05:00
xiarixiaoyao
294d712948
[HUDI-3001] Clean up the marker directory when finish bootstrap operation. ( #4298 )
2021-12-16 12:36:01 -08:00
ForwardXu
dd96129191
[HUDI-2990] Sync to HMS when deleting partitions ( #4291 )
2021-12-13 20:40:06 +08:00
Alexey Kudinkin
2d864f7524
[HUDI-2814] Make Z-index more generic Column-Stats Index ( #4106 )
2021-12-10 14:56:09 -08:00
xiarixiaoyao
68f8597b12
[HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished. ( #4265 )
...
* [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished.
2021-12-09 19:51:49 +08:00
Yann Byron
2f96f4300b
Revert "[HUDI-2495] Resolve inconsistent key generation for timestamp types by GenericRecord and Row ( #3944 )" ( #4201 )
2021-12-03 11:13:38 -05:00
Alexey Kudinkin
bed7f9897a
[HUDI-2911] Removing default value for PARTITIONPATH_FIELD_NAME resulting in incorrect KeyGenerator configuration ( #4195 )
2021-12-03 07:33:38 -05:00
Yann Byron
ca427240c0
[MINOR] use catalog schema if can not find table schema ( #4182 )
2021-12-03 00:37:13 -08:00
zzzhy
61a03bc072
[MINOR] Fix the wrong usage of timestamp length variable bug ( #4179 )
...
Signed-off-by: zzzhy <candle_1667@163.com >
2021-12-02 22:47:31 +08:00
董可伦
a398aad1fc
[HUDI-2642] Add support ignoring case in update sql operation ( #3882 )
2021-11-29 22:36:36 -08:00
董可伦
3433f00cb5
[MINOR] Fix typo,rename 'getUrlEncodePartitoning' to 'getUrlEncodePartitioning' ( #4130 )
2021-11-29 18:31:22 -08:00
Sivabalan Narayanan
38e75ea806
Removing rfc from release package and fixing release validation script ( #4147 )
2021-11-29 13:18:35 +08:00