Sagar Sumit
5804ad8e32
[HUDI-1483] Support async clustering for deltastreamer and Spark streaming ( #3142 )
...
- Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink
- Added methods in HoodieAsyncService to reuse code
2021-07-11 14:43:38 -04:00
Sivabalan Narayanan
16e90d30ea
[HUDI-1105] Adding dedup support for Bulk Insert w/ Rows ( #2206 )
2021-07-07 17:38:26 -04:00
Sivabalan Narayanan
ea9e5d0e8b
[HUDI-1104] Adding support for UserDefinedPartitioners and SortModes to BulkInsert with Rows ( #3149 )
2021-07-07 11:15:25 -04:00
pengzhiwei
287d2dd79c
[HUDI-2131] Exception Throw Out When MergeInto With Decimal Type Field ( #3224 )
2021-07-05 22:28:57 +08:00
xiarixiaoyao
2cecb75187
[HUDI-2058]support incremental query for insert_overwrite_table/insert_overwrite operation on cow table ( #3139 )
2021-07-05 18:54:05 +08:00
xiarixiaoyao
6a71412f78
[HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem ( #3209 )
2021-07-04 22:30:36 +08:00
pengzhiwei
4f215e2938
[HUDI-2057] CTAS Generate An External Table When Create Managed Table ( #3146 )
2021-07-03 15:55:36 +08:00
pengzhiwei
70d9c2e747
[HUDI-2123] Exception When Merge With Null-Value Field ( #3214 )
2021-07-02 22:46:52 +08:00
pengzhiwei
ac65189458
[HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value ( #3208 )
2021-07-02 17:39:57 +08:00
pengzhiwei
6403547431
[HUDI-2051] Enable Hive Sync When Spark Enable Hive Meta For Spark Sql ( #3126 )
2021-07-02 01:08:36 -07:00
wenningd
d412fb2fe6
[HUDI-89] Add configOption & refactor all configs based on that ( #2833 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2021-06-30 14:26:30 -07:00
pengzhiwei
84dd3ca18b
[HUDI-2053] Insert Static Partition With DateType Return Incorrect Partition Value ( #3133 )
2021-06-24 19:09:37 +08:00
pengzhiwei
7e50f9a5a6
[HUDI-2061] Incorrect Schema Inference For Schema Evolved Table ( #3137 )
2021-06-23 22:48:01 -07:00
pengzhiwei
69c0d9e2d0
[HUDI-1883] Support Truncate Table For Hoodie ( #3098 )
2021-06-22 22:33:20 +08:00
pengzhiwei
4fd8a88b7e
[HUDI-1776] Support AlterCommand For Hoodie ( #3086 )
2021-06-21 22:58:43 +08:00
pengzhiwei
b9e28e5292
[HUDI-2033] ClassCastException Throw When PreCombineField Is String Type ( #3099 )
2021-06-17 23:21:20 +08:00
pengzhiwei
ad53cf450e
[HUDI-1879] Fix RO Tables Returning Snapshot Result ( #2925 )
2021-06-17 04:18:21 -07:00
Jintao Guan
b8fe5b91d5
[HUDI-764] [HUDI-765] ORC reader writer Implementation ( #2999 )
...
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com >
2021-06-15 15:21:43 -07:00
Sivabalan Narayanan
7d9f9d7d82
[HUDI-1991] Fixing drop dups exception in bulk insert row writer path ( #3055 )
2021-06-14 09:55:52 +08:00
wangxianghu
7261f08507
[HUDI-1929] Support configure KeyGenerator by type ( #2993 )
2021-06-08 09:26:10 -04:00
pengzhiwei
f760ec543e
[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie ( #2645 )
...
Main functions:
Support create table for hoodie.
Support CTAS.
Support Insert for hoodie. Including dynamic partition and static partition insert.
Support MergeInto for hoodie.
Support DELETE
Support UPDATE
Both support spark2 & spark3 based on DataSourceV1.
Main changes:
Add sql parser for spark2.
Add HoodieAnalysis for sql resolve and logical plan rewrite.
Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS.
In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the
HoodieWriteHandler and other related classes.
1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression.
2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into.
3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema.
Verify this pull request
Add TestCreateTable for test create hoodie tables and CTAS.
Add TestInsertTable for test insert hoodie tables.
Add TestMergeIntoTable for test merge hoodie tables.
Add TestUpdateTable for test update hoodie tables.
Add TestDeleteTable for test delete hoodie tables.
Add TestSqlStatement for test supported ddl/dml currently.
2021-06-07 23:24:32 -07:00
Vinay Patil
2a7e1e091e
[HUDI-1942] Add Default value for HIVE_AUTO_CREATE_DATABASE_OPT_KEY in HoodieSparkSqlWriter ( #3036 )
2021-06-05 18:02:26 -04:00
pengzhiwei
dcd7c331dc
[HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table ( #2926 )
2021-05-29 07:50:24 -07:00
wangxianghu
e7020748b5
[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME ( #2978 )
2021-05-25 16:29:55 +08:00
mpouttu
369a849337
[HUDI-1873] collect() call causing issues with very large upserts ( #2907 )
...
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-05-24 01:29:01 -04:00
Sivabalan Narayanan
5d1f592395
[HUDI-1806] Honoring skipROSuffix in spark ds ( #2882 )
...
* Honoring skipROSuffix in spark ds
* Adding tests
* fixing scala checkstype issue
2021-05-18 16:11:39 -07:00
pengzhiwei
aacb8be521
[HUDI-1415] Read Hoodie Table As Spark DataSource Table ( #2283 )
2021-04-20 14:21:38 -07:00
Sivabalan Narayanan
8d29863c86
[HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter ( #2777 )
2021-04-14 15:35:39 +08:00
Danny Chan
ab4a7b0b4a
[HUDI-1788] Insert overwrite (table) for Flink writer ( #2808 )
...
Supports `INSERT OVERWRITE` and `INSERT OVERWRITE TABLE` for Flink
writer.
2021-04-14 10:23:37 +08:00
pengzhiwei
684622c7c9
[HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning ( #2651 )
2021-04-01 11:12:28 -07:00
Liulietong
ce3e8ec870
[HUDI-1667]: Fix a null value related bug for spark vectorized reader. ( #2636 )
2021-03-20 07:54:20 -07:00
xiarixiaoyao
d429169ff7
[HUDI-1688]hudi write should uncache rdd, when the write operation is finnished ( #2673 )
2021-03-18 10:19:18 -07:00
n3nash
74241947c1
[HUDI-845] Added locking capability to allow multiple writers ( #2374 )
...
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
2021-03-16 16:43:53 -07:00
Sivabalan Narayanan
b038623ed3
[HUDI 1615] Fixing null schema in bulk_insert row writer path ( #2653 )
...
* [HUDI-1615] Avoid passing in null schema from row writing/deltastreamer
* Fixing null schema in bulk insert row writer path
* Fixing tests
Co-authored-by: vc <vinoth@apache.org >
2021-03-16 09:44:11 -07:00
pengzhiwei
bc883db5de
[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig ( #2596 )
2021-03-05 14:10:27 +08:00
Raymond Xu
899ae70fdb
[HUDI-1587] Add latency and freshness support ( #2541 )
...
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
liujinhui
8c2197ae5e
[HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable ( #2443 )
...
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-02-25 10:09:32 -05:00
Sivabalan Narayanan
c9fcf964b2
[HUDI-1315] Adding builder for HoodieTableMetaClient initialization ( #2534 )
2021-02-20 09:54:26 +08:00
pengzhiwei
37972071ff
[HUDI-1109] Support Spark Structured Streaming read from Hudi table ( #2485 )
2021-02-17 03:36:29 -08:00
teeyog
26da4f5462
[HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field ( #2431 )
2021-02-10 12:07:54 -05:00
pengzhiwei
0d8a4d0a56
[HUDI-1550] Honor ordering field for MOR Spark datasource reader ( #2497 )
2021-02-01 21:04:27 +08:00
jiangjiguang
5d053b495b
[MINOR] Quickstart.generateUpdates method add check ( #2505 )
2021-01-30 10:28:00 +08:00
liujinhui
244f6def9c
[MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database ( #2444 )
...
fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database
2021-01-20 22:58:18 +08:00
lw0090
de42adc230
[HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE ( #2428 )
2021-01-11 09:07:47 -08:00
Gary Li
23e93d05c0
[MINOR] fix spark 3 build for incremental query on MOR ( #2425 )
2021-01-09 21:08:55 -08:00
Gary Li
79ec7b4894
[HUDI-920] Support Incremental query for MOR table ( #1938 )
2021-01-09 08:02:08 -08:00
Udit Mehrotra
17df517b81
[HUDI-1510] Move HoodieEngineContext and its dependencies to hudi-common ( #2410 )
2021-01-07 11:34:06 -08:00
Udit Mehrotra
4e64226844
[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326)
...
[HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)
2021-01-04 07:59:47 -08:00
pengzhiwei
b83d1d3e61
[HUDI-1484] Escape the partition value in HiveSyncTool ( #2363 )
2020-12-28 23:02:36 -05:00
wenningd
286055ce34
[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 ( #2328 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00