1
0
Commit Graph

122 Commits

Author SHA1 Message Date
Shawy Geng
44e41dc9bb [HUDI-2117] Unpersist the input rdd after the commit is completed to … (#3207)
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-07-29 08:16:58 -07:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
xiarixiaoyao
5353243449 [HUDI-2214]residual temporary files after clustering are not cleaned up (#3335) 2021-07-26 10:26:20 -07:00
Gary Li
a5638b995b [MINOR] Close log scanner after compaction completed (#3294) 2021-07-26 17:39:13 +08:00
Sivabalan Narayanan
d5026e9a24 [HUDI-2161] Adding support to disable meta columns with bulk insert operation (#3247) 2021-07-19 20:43:48 -04:00
Jintao Guan
2debb9b3ed [HUDI-1828] Update unit tests to support ORC as the base file format (#3237) 2021-07-15 00:05:42 +08:00
zhangyue19921010
c8a2033c27 [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data (#3240)
* fixed

* add testUpsertPartitionerWithSmallFileHandlingAndClusteringPlan ut

* fix CheckStyle

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-07-12 18:14:17 -07:00
Sagar Sumit
5804ad8e32 [HUDI-1483] Support async clustering for deltastreamer and Spark streaming (#3142)
- Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink
- Added methods in HoodieAsyncService to reuse code
2021-07-11 14:43:38 -04:00
Sivabalan Narayanan
8c0dbaa9b3 [HUDI-2009] Fixing extra commit metadata in row writer path (#3075) 2021-07-08 03:07:27 -04:00
Yungthuis
1d3cd06572 [HUDI-2134]Add generics to avoif forced conversion in BaseSparkCommitActionExecutor#partition (#3232) 2021-07-08 13:31:38 +08:00
Sivabalan Narayanan
16e90d30ea [HUDI-1105] Adding dedup support for Bulk Insert w/ Rows (#2206) 2021-07-07 17:38:26 -04:00
Sivabalan Narayanan
ea9e5d0e8b [HUDI-1104] Adding support for UserDefinedPartitioners and SortModes to BulkInsert with Rows (#3149) 2021-07-07 11:15:25 -04:00
Prashant Wason
990820476a [HUDI-2140] Fixed the unit test TestHoodieBackedMetadata.testOnlyValidPartitionsAdded. (#3234) 2021-07-06 23:50:27 -07:00
Prashant Wason
221ddd9bf3 [HUDI-2016] Fixed bootstrap of Metadata Table when some actions are in progress. (#3083)
Metadata Table cannot be bootstrapped when any action is in progress. This is detected by the presence of inflight or requested instants. The bootstrapping is initiated in preWrite and postWrite of each commit. So bootstrapping will be retried again until it succeeds.
Also added metrics for when the bootstrapping fails or a table is re-bootstrapped. This will help detect tables which are not getting bootstrapped.
2021-07-06 08:08:46 -07:00
Randal Boyle
60e0254e67 [HUDI-1996] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry (#3097)
* adding support for basic auth with confluent cloud schema registry
2021-07-05 23:40:23 -07:00
wangxianghu
650c4455c6 [HUDI-2122] Improvement in packaging insert into smallfiles (#3213) 2021-07-05 09:30:57 -07:00
wangxianghu
62a1ad8b3a [HUDI-1930] Bootstrap support configure KeyGenerator by type (#3170)
* [HUDI-1930] Bootstrap support configure KeyGenerator by type
2021-07-03 20:27:37 +08:00
pengzhiwei
6eca06d074 [HUDI-2105] Compaction Failed For MergeInto MOR Table (#3190) 2021-07-01 23:40:14 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
leesf
e039e0ff6d [HUDI-2064] Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded (#3141) 2021-06-24 07:37:55 +08:00
Prashant Wason
11e64b2db0 [HUDI-1717] Metadata Reader should merge all the un-synced but complete instants from the dataset timeline. (#3082) 2021-06-22 23:52:18 +08:00
Prashant Wason
062d5baf84 [HUDI-2013] Removed option to fallback to file listing when Metadata Table is enabled. (#3079) 2021-06-22 23:41:52 +08:00
Rong Ma
7bd517a82f [HUDI-2031] JVM occasionally crashes during compaction when spark speculative execution is enabled (#3093)
* unit tests added
2021-06-21 18:09:51 -07:00
Wei
7865da1e15 [MINOR] Fix Javadoc wrong references (#3115) 2021-06-18 21:51:54 -07:00
swuferhong
5ce64a81bd Fix the filter condition is missing in the judgment condition of compaction instance (#3025) 2021-06-16 14:28:53 -07:00
Jintao Guan
b8fe5b91d5 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>
2021-06-15 15:21:43 -07:00
wangxianghu
7261f08507 [HUDI-1929] Support configure KeyGenerator by type (#2993) 2021-06-08 09:26:10 -04:00
pengzhiwei
f760ec543e [HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645)
Main functions:
Support create table for hoodie.
Support CTAS.
Support Insert for hoodie. Including dynamic partition and static partition insert.
Support MergeInto for hoodie.
Support DELETE
Support UPDATE
Both support spark2 & spark3 based on DataSourceV1.

Main changes:
Add sql parser for spark2.
Add HoodieAnalysis for sql resolve and logical plan rewrite.
Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS.
In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the
HoodieWriteHandler and other related classes.
1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression.
2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into.
3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema.


Verify this pull request
Add TestCreateTable for test create hoodie tables and CTAS.
Add TestInsertTable for test insert hoodie tables.
Add TestMergeIntoTable for test merge hoodie tables.
Add TestUpdateTable for test update hoodie tables.
Add TestDeleteTable for test delete hoodie tables.
Add TestSqlStatement for test supported ddl/dml currently.
2021-06-07 23:24:32 -07:00
rmpifer
0709c62a6b [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing (#2902)
Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2021-05-29 08:06:01 -04:00
Susu Dong
685f77b5dd [HUDI-1740] Fix insert-overwrite API archival (#2784)
- fix problem of archiving replace commits
- Fix problem when getting empty replacecommit.requested
- Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods

Co-authored-by: yorkzero831 <yorkzero8312@gmail.com>
Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>
2021-05-21 13:52:13 -07:00
Y Ethan Guo
a96034d38d [HUDI-1888] Fix NPE when the nested partition path field has null value (#2957) 2021-05-21 08:28:11 -04:00
wangxianghu
ced068e1ee [MINOR] Remove unused method in BaseSparkCommitActionExecutor (#2965) 2021-05-20 10:18:07 +08:00
lw0090
5a8b2a4f86 [HUDI-1768] add spark datasource unit test for schema validate add column (#2776) 2021-05-11 16:49:18 -04:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00
satishkotha
386767693d [HUDI-1833] rollback pending clustering even if there is greater commit (#2863)
* [HUDI-1833] rollback pending clustering even if there are greater commits
2021-04-27 14:21:42 -07:00
satishkotha
2999586509 [HUDI-1690] use jsc union instead of rdd union (#2872) 2021-04-26 23:35:01 -07:00
Danny Chan
d047e91d86 [HUDI-1837] Add optional instant range to log record scanner for log (#2870) 2021-04-26 16:53:18 +08:00
Chanh Le
a1e636dc6b [HUDI-1551] Add support for BigDecimal and Integer when partitioning based on time. (#2851)
Co-authored-by: trungchanh.le <trungchanh.le@bybit.com>
2021-04-22 21:56:20 +08:00
jsbali
b31c520c66 [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl… (#2677)
* Added tests to TestHoodieTimelineArchiveLog for the archival of completed clean and rollback actions.

* Adding code review changes

* [HUDI-1714] Minor Fixes
2021-04-21 10:27:43 -07:00
li36909
6b4b878d08 [HUDI-1744] rollback fails on mor table when the partition path hasn't any files (#2749)
Co-authored-by: lrz <lrz@lrzdeMacBook-Pro.local>
2021-04-19 15:44:11 -07:00
Aditya Tiwari
ec2334ceac [HUDI-1716]: Resolving default values for schema from dataframe (#2765)
- Adding default values and setting null as first entry in UNION data types in avro schema. 

Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>
2021-04-19 10:05:20 -04:00
hj2016
1da16dfd2e [HUDI-1784] Added print detailed stack log when hbase connection error (#2799) 2021-04-12 13:46:06 +08:00
hongdd
ecdbd2517f [HUDI-699] Fix CompactionCommand and add unit test for CompactionCommand (#2325) 2021-04-08 15:35:33 +08:00
pengzhiwei
684622c7c9 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 2021-04-01 11:12:28 -07:00
Sebastian Bernauer
aa0da72c59 Preparation for Avro update (#2650) 2021-03-30 21:50:17 -07:00
Gary Li
452f5e2d66 [HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727) 2021-03-29 06:04:48 -07:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
Jintao Guan
1277c62398 [HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator (#2627)
* [HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator

* update NonpartitionedKeyGenerator to support composite record keys

* update NonpartitionedKeyGenerator
2021-03-18 15:33:31 -07:00
n3nash
74241947c1 [HUDI-845] Added locking capability to allow multiple writers (#2374)
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
2021-03-16 16:43:53 -07:00