Y Ethan Guo
b8601a9f58
[HUDI-2656] Generalize HoodieIndex for flexible record data type ( #3893 )
...
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-02-03 20:24:04 -08:00
Alexey Kudinkin
d681824982
[HUDI-3337] Fixing Parquet Column Range metadata extraction ( #4705 )
...
- Parquet Column Range metadata extraction utility was simplistically assuming that Decimal types are only represented by INT32, while they representation varies depending on precision.
- More details could be found here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#DECIMAL
2022-02-02 20:58:05 -05:00
jsbali
7ce0f4522b
[HUDI-2711] Fallback to fulltable scan for IncrementalRelation if underlying files have been cleared or moved by cleaner ( #3946 )
...
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-01-31 23:03:18 -05:00
Yann Byron
ecbad9526a
[HUDI-3253] preferred to use the table's own location ( #4608 )
2022-01-29 00:39:42 -08:00
Raymond Xu
0bd38f26ca
[HUDI-2596] Make class names consistent in hudi-client ( #4680 )
2022-01-27 17:05:08 -08:00
Yann Byron
4a9f826382
[HUDI-3215] Solve UT for Spark 3.2 ( #4565 )
2022-01-26 14:48:26 -08:00
xuzifu666
bf409e8423
[MINOR] Standardize HoodieSqlCommon.g4 file ( #4582 )
2022-01-25 10:09:08 +08:00
Yann Byron
26c3f797b0
[HUDI-3237] gracefully fail to change column data type ( #4677 )
2022-01-24 16:33:36 -08:00
Alexey Kudinkin
bc7882cbe9
[HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering ( #4606 )
...
Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
2022-01-24 16:53:54 -05:00
董可伦
cfde45b548
[HUDI-3282] Fix delete exception for Spark SQL when sync Hive ( #4644 )
2022-01-23 11:32:57 -08:00
Sivabalan Narayanan
f7a77961e3
[HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write ( #2903 )
2022-01-23 14:23:21 -05:00
董可伦
56cd8ffae0
[HUDI-2837] Add support for using database name in incremental query ( #4083 )
2022-01-22 22:11:27 -08:00
Y Ethan Guo
4b9085057a
[HUDI-3268] Fix NPE while reading table with Spark datasource ( #4630 )
2022-01-21 08:46:07 -05:00
董可伦
8547f11752
[HUDI-3271] Code optimization and clean up unused code in HoodieSparkSqlWriter ( #4631 )
2022-01-20 18:49:04 -05:00
Yann Byron
31b57a256f
[HUDI-3236] use fields'comments persisted in catalog to fill in schema ( #4587 )
2022-01-19 21:44:35 -08:00
Alexey Kudinkin
4bea758738
[HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex ( #4531 )
2022-01-18 14:54:51 -08:00
Thinking Chen
caeea946fb
[HUDI-3245] Convert uppercase letters to lowercase in storage configs ( #4602 )
2022-01-18 14:51:09 -05:00
Yann Byron
a09c231911
[HUDI-2903] get table schema from the last commit with data written ( #4180 )
2022-01-18 10:50:30 -05:00
RexAn
f18447406d
[HUDI-1558] Struct Stream Source Support Spark3 ( #4586 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-01-18 11:08:33 +08:00
Alexey Kudinkin
75caa7d3d8
[HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines ( #4520 )
2022-01-16 22:46:20 -08:00
Yann Byron
d2dda55794
[HUDI-2968] add UT for update/delete on non-pk condition ( #4568 )
2022-01-16 12:02:12 -08:00
Yann Byron
5e0171a5ee
[HUDI-3198] Improve Spark SQL create table from existing hudi table ( #4584 )
...
To modify SQL statement for creating hudi table based on an existing hudi path.
From:
```sql
create table hudi_tbl using hudi tblproperties (primaryKey='id', preCombineField='ts', type='cow') partitioned by (pt) location '/path/to/hudi'
```
To:
```sql
create table hudi_tbl using hudi location '/path/to/hudi'
```
2022-01-14 10:15:29 -08:00
leesf
5ce45c440b
[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation ( #4514 )
...
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
董可伦
017ddbbfac
[MINOR] Fix typos ( #4567 )
2022-01-11 23:17:10 -08:00
Yann Byron
36790709f7
[HUDI-3125] spark-sql write timestamp directly ( #4471 )
2022-01-08 23:43:25 -08:00
Sagar Sumit
827549949c
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator ( #4203 )
...
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
Raymond Xu
2467c137e4
[HUDI-3100] Add config for hive conditional sync ( #4440 )
2022-01-06 23:26:35 -08:00
Vinish Reddy
eee715b3ff
[HUDI-3168] Fixing null schema with empty commit in incremental relation ( #4513 )
2022-01-05 11:43:10 -05:00
Sivabalan Narayanan
7329d229d5
Adding tests to validate different key generators ( #4473 )
2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad
[HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 ( #4498 )
2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9
[HUDI-2558] Fixing Clustering w/ sort columns with null values fails ( #4404 )
2022-01-03 12:19:43 +05:30
Yann Byron
fe9406dd33
[HUDI-3131] fix ctas error in spark3.1.1 ( #4476 )
2022-01-02 03:06:55 -08:00
Yann Byron
1622b52c9c
[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 ( #4490 )
2022-01-02 02:42:10 -08:00
Shawy Geng
a4e622ac61
[HUDI-1951] Add bucket hash index, compatible with the hive bucket ( #3173 )
...
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-12-30 12:38:26 -08:00
ForwardXu
504747ecf4
[HUDI-3108] Fix Purge Drop MOR Table Cause error ( #4455 )
2021-12-29 20:23:23 +08:00
Yann Byron
05942e018c
[HUDI-2811] Support Spark 3.2 ( #4270 )
2021-12-28 00:12:44 -08:00
Yann Byron
1f7afba5e4
[HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator ( #4416 )
2021-12-27 21:39:52 -08:00
ForwardXu
282aa68552
[HUDI-3099] Purge drop partition for spark sql ( #4436 )
2021-12-28 09:38:26 +08:00
xuzifu666
032b883bd1
[HUDI-3014] Add table option to set utc timezone ( #4306 )
2021-12-23 16:27:45 +08:00
ForwardXu
5d93edc539
[HUDI-3060] drop table for spark sql ( #4364 )
2021-12-22 19:17:43 +08:00
harshal patil
7d046f914a
[HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
2021-12-21 11:54:52 +05:30
xuzifu666
3ca92108b2
remove unused import ( #4349 )
2021-12-20 16:32:41 +08:00
Sivabalan Narayanan
03f71ef1a2
[HUDI-2970] Adding tests for archival of replace commit actions ( #4268 )
2021-12-18 23:59:39 -08:00
xiarixiaoyao
9246b16492
[HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType ( #4253 )
2021-12-17 08:58:02 -05:00
xiarixiaoyao
294d712948
[HUDI-3001] Clean up the marker directory when finish bootstrap operation. ( #4298 )
2021-12-16 12:36:01 -08:00
ForwardXu
dd96129191
[HUDI-2990] Sync to HMS when deleting partitions ( #4291 )
2021-12-13 20:40:06 +08:00
Alexey Kudinkin
2d864f7524
[HUDI-2814] Make Z-index more generic Column-Stats Index ( #4106 )
2021-12-10 14:56:09 -08:00
xiarixiaoyao
68f8597b12
[HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished. ( #4265 )
...
* [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished.
2021-12-09 19:51:49 +08:00
Yann Byron
2f96f4300b
Revert "[HUDI-2495] Resolve inconsistent key generation for timestamp types by GenericRecord and Row ( #3944 )" ( #4201 )
2021-12-03 11:13:38 -05:00
Alexey Kudinkin
bed7f9897a
[HUDI-2911] Removing default value for PARTITIONPATH_FIELD_NAME resulting in incorrect KeyGenerator configuration ( #4195 )
2021-12-03 07:33:38 -05:00