1
0
Commit Graph

343 Commits

Author SHA1 Message Date
pengzhiwei
bbadac7de1 [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write (#2296) 2021-07-28 21:30:18 -07:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Danny Chan
ab2e0d0ba2 [HUDI-2219] Fix NPE of HoodieConfig (#3342) 2021-07-27 15:18:05 +08:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
rmahindra123
a14b19fdd5 [HUDI-1241] Automate the generation of configs webpage as configs are added to Hudi repo (#3302) 2021-07-23 21:33:34 -07:00
jsbali
66207ed91a [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879)
* [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool

* [HUDI-1848] Fixing test cases

* [HUDI-1848] CR changes

* [HUDI-1848] Fix checkstyle violations

* [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels.

* [HUDI-1848] Adding the complex schema and resolving merge conflicts

* [HUDI-1848] Adding some more javadocs

* [HUDI-1848] Added javadocs for DDLExecutor impls

* [HUDI-1848] Fixed style issue
2021-07-23 09:03:15 -07:00
pengzhiwei
2c910ee3af [HUDI-2212] Missing PrimaryKey In Hoodie Properties For CTAS Table (#3332) 2021-07-23 15:21:57 +08:00
pengzhiwei
5a2f3d439e [HUDI-2139] MergeInto MOR Table May Result InCorrect Result (#3230) 2021-07-23 10:19:43 +08:00
Samrat
a086d255c8 [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer (#3184) 2021-07-19 21:49:43 -04:00
Sivabalan Narayanan
d5026e9a24 [HUDI-2161] Adding support to disable meta columns with bulk insert operation (#3247) 2021-07-19 20:43:48 -04:00
rmahindra123
d024439764 [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map (#3128) 2021-07-14 22:57:38 -04:00
vinoth chandar
75040ee9e5 [HUDI-2149] Ensure and Audit docs for every configuration class in the codebase (#3272)
- Added docs when missing
 - Rewrote, reworded as needed
 - Made couple more classes extend HoodieConfig
2021-07-14 10:56:08 -07:00
Jintao Guan
2debb9b3ed [HUDI-1828] Update unit tests to support ORC as the base file format (#3237) 2021-07-15 00:05:42 +08:00
Vinay Patil
7395a56dfb [HUDI-2168] Fix for AccessControlException for anonymous user (#3264) 2021-07-13 08:56:51 -04:00
zhangyue19921010
c8a2033c27 [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data (#3240)
* fixed

* add testUpsertPartitionerWithSmallFileHandlingAndClusteringPlan ut

* fix CheckStyle

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-07-12 18:14:17 -07:00
Sagar Sumit
5804ad8e32 [HUDI-1483] Support async clustering for deltastreamer and Spark streaming (#3142)
- Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink
- Added methods in HoodieAsyncService to reuse code
2021-07-11 14:43:38 -04:00
Shawy Geng
55ecbc662e [HUDI-2115] FileSlices in the filegroup is not descending by timestamp (#3206) 2021-07-07 22:24:36 +08:00
Prashant Wason
221ddd9bf3 [HUDI-2016] Fixed bootstrap of Metadata Table when some actions are in progress. (#3083)
Metadata Table cannot be bootstrapped when any action is in progress. This is detected by the presence of inflight or requested instants. The bootstrapping is initiated in preWrite and postWrite of each commit. So bootstrapping will be retried again until it succeeds.
Also added metrics for when the bootstrapping fails or a table is re-bootstrapped. This will help detect tables which are not getting bootstrapped.
2021-07-06 08:08:46 -07:00
dwshmilyss
2b21ae1775 [HUDI-2046] Loaded too many classes like sun/reflect/GeneratedSerializationConstructorAccessor in JVM metaspace (#3121)
Loaded too many classes when use kryo of spark to hudi

Co-authored-by: weiwei.duan <weiwei.duan@linkflowtech.com>
2021-07-05 23:36:55 -07:00
rmahindra123
a4dcbb5c5a [HUDI-2028] Implement RockDbBasedMap as an alternate to DiskBasedMap in ExternalSpillableMap (#3194)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-07-05 23:03:41 -07:00
Shawy Geng
2033d35dc3 [HUDI-2127] Initialize the maxMemorySizeInBytes in log scanner (#3220) 2021-07-05 11:53:18 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
s-sanjay
0fb8556b0d Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542)
[global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync

Co-authored-by: Jagmeet Bali <jsbali@uber.com>
2021-06-24 20:26:26 -07:00
Prashant Wason
11e64b2db0 [HUDI-1717] Metadata Reader should merge all the un-synced but complete instants from the dataset timeline. (#3082) 2021-06-22 23:52:18 +08:00
Prashant Wason
062d5baf84 [HUDI-2013] Removed option to fallback to file listing when Metadata Table is enabled. (#3079) 2021-06-22 23:41:52 +08:00
Rong Ma
7bd517a82f [HUDI-2031] JVM occasionally crashes during compaction when spark speculative execution is enabled (#3093)
* unit tests added
2021-06-21 18:09:51 -07:00
swuferhong
cb5cd35991 [HUDI-2043] HoodieDefaultTimeline$filterPendingCompactionTImeline() method have wrong filter condition (#3109) 2021-06-21 17:53:54 -07:00
Wei
53396061cc [MINOR] Fix wrong package name (#3114) 2021-06-19 11:50:01 +08:00
Jintao Guan
b8fe5b91d5 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>
2021-06-15 15:21:43 -07:00
Raymond Xu
f922837064 [HUDI-1950] Fix Azure CI failure in TestParquetUtils (#2984)
* fix azure pipeline configs

* add pentaho.org in maven repositories

* Make sure file paths with scheme in TestParquetUtils

* add azure build status to README
2021-06-15 03:45:17 -07:00
Prashant Wason
515ce8eb36 [MINOR] Fixed the log which should only be printed when the Metadata Table is disabled. (#3080) 2021-06-15 16:18:15 +08:00
Xuedong Luan
673d62f3c3 [MINOR] Add Tencent Cloud HDFS storage support for hudi (#3064) 2021-06-11 09:16:51 +08:00
JunZhang
e0108e972e [MINOR] Add Baidu BOS storage support for hudi (#3061)
Co-authored-by: zhangjun30 <zhangjun30@baidu.com>
2021-06-10 15:51:36 +08:00
Vinay Patil
11360f707e [HUDI-1892] Fix NPE when avro field value is null (#3051) 2021-06-08 18:12:18 -04:00
pengzhiwei
f760ec543e [HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645)
Main functions:
Support create table for hoodie.
Support CTAS.
Support Insert for hoodie. Including dynamic partition and static partition insert.
Support MergeInto for hoodie.
Support DELETE
Support UPDATE
Both support spark2 & spark3 based on DataSourceV1.

Main changes:
Add sql parser for spark2.
Add HoodieAnalysis for sql resolve and logical plan rewrite.
Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS.
In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the
HoodieWriteHandler and other related classes.
1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression.
2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into.
3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema.


Verify this pull request
Add TestCreateTable for test create hoodie tables and CTAS.
Add TestInsertTable for test insert hoodie tables.
Add TestMergeIntoTable for test merge hoodie tables.
Add TestUpdateTable for test update hoodie tables.
Add TestDeleteTable for test delete hoodie tables.
Add TestSqlStatement for test supported ddl/dml currently.
2021-06-07 23:24:32 -07:00
Vinay Patil
f3d7b49bfe [HUDI-1148] Remove Hadoop Conf Logs (#3040) 2021-06-07 14:49:55 -07:00
Vinay Patil
cf90f17732 [HUDI-1281] Add deltacommit to ActionType (#3018)
Co-authored-by: veenaypatil <vinay18.patil@gmail.com>
2021-06-04 22:30:48 -07:00
Wei
f6eee77636 [MINOR] Remove the implementation of Serializable from HoodieException (#3020) 2021-06-03 19:46:33 +08:00
hk__lrzy
83b0301c1a [HUDI-1943] Lose properties when hoodieWriteConfig initializtion (#3006)
* [hudi-flink]fix lose properties problem

Co-authored-by: haoke <haoke@bytedance.com>
2021-06-01 16:09:48 +08:00
Yao WANG
7a63175a70 fix the grammer err of the comment (#3013)
Co-authored-by: ywang46 <ywang46@paypal.com>
2021-05-31 11:44:25 +08:00
rmpifer
0709c62a6b [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing (#2902)
Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2021-05-29 08:06:01 -04:00
Raymond Xu
afa6bc0b10 [HUDI-1723] Fix path selector listing files with the same mod date (#2845) 2021-05-25 10:19:10 -04:00
wangxianghu
e7020748b5 [HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978) 2021-05-25 16:29:55 +08:00
wangxianghu
6539813733 [MINOR] Update the javadoc of EngineType (#2979) 2021-05-22 19:44:08 +08:00
Susu Dong
685f77b5dd [HUDI-1740] Fix insert-overwrite API archival (#2784)
- fix problem of archiving replace commits
- Fix problem when getting empty replacecommit.requested
- Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods

Co-authored-by: yorkzero831 <yorkzero8312@gmail.com>
Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>
2021-05-21 13:52:13 -07:00
zhangminglei
fe3f5c2d56 [HUDI-1913] Using streams instead of loops for input/output (#2962) 2021-05-19 09:13:38 +08:00
Danny Chan
46a2399a45 [HUDI-1902] Global index for flink writer (#2958)
Supports deduplication for record keys with different partition path.
2021-05-18 13:55:38 +08:00
xoln ann
12443e4187 [HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init (#2520)
Co-authored-by: zhongliang <zhongliang@kuaishou.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-05-14 00:29:26 -04:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00
Volodymyr Burenin
8a48d16e41 [HUDI-1707] Reduces log level for too verbose messages from info to debug level. (#2714)
* Reduces log level for too verbose messages from info to debug level.
* Sort config output.
* Code Review : Small restructuring + rebasing to master
 - Fixing flaky multi delta streamer test
 - Using isDebugEnabled() checks
 - Some changes to shorten log message without moving to DEBUG

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-10 07:16:02 -07:00