1
0
Commit Graph

2620 Commits

Author SHA1 Message Date
Raymond Xu
686da41696 [HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120) 2022-03-24 09:10:33 -07:00
Raymond Xu
b14706502b [HUDI-3689] Remove Azure CI cache (#5121) 2022-03-24 05:39:11 -07:00
Alexey Kudinkin
ccc3728002 [HUDI-3684] Fixing NPE in ParquetUtils (#5102)
* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`
2022-03-24 17:37:38 +05:30
Sagar Sumit
fe2c3989e3 [HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)
* Remove glob pattern basePath from the deltastreamer tests.

* [HUDI-3689] Fix file scheme config

for CI failure in TestHoodieRealTimeRecordReader

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-03-24 15:48:35 +05:30
Danny Chan
a1c42fcc07 [minor] Checks the data block type for archived timeline (#5106) 2022-03-24 14:10:43 +08:00
Sivabalan Narayanan
52f0498330 Fixing non partitioned all files record in MDT (#5108) 2022-03-23 19:26:39 -07:00
Sagar Sumit
f96ba7abf0 [HUDI-3642] Handle NPE due to empty requested replacecommit metadata (#5090) 2022-03-23 12:13:02 -07:00
Rajesh Mahindra
5f570ea151 [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175)
- Refactor hive sync tool / config to use reflection and standardize configs

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-03-21 22:56:31 -04:00
Y Ethan Guo
9b6e138af2 [HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine (#5075) 2022-03-21 20:35:06 -04:00
Pratyaksh Sharma
ca0931d332 [HUDI-1436]: Provide an option to trigger clean every nth commit (#4385)
- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-03-21 20:06:30 -04:00
wxp4532
26e5d2e6fc [HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.

close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Sivabalan Narayanan
a118d56b07 [MINOR] Fixing sparkUpdateNode for record generation (#5079) 2022-03-21 00:56:30 -04:00
Danny Chan
799c78e688 [HUDI-3665] Support flink multiple versions (#5072) 2022-03-21 10:34:50 +08:00
Y Ethan Guo
15d1c18625 [MINOR] Remove flaky assert in TestInLineFileSystem (#5069) 2022-03-20 18:58:30 -04:00
Alexey Kudinkin
1b6e201160 [HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit (#5070)
* Fixed metadata conversion util to extract schema from `HoodieCommitMetadata`

* Fixed failure to fetch columns to index in empty table

* Abort indexing seq in case there are no columns to index

* Fallback to index at least primary key columns, in case no writer schema could be obtained to index all columns

* Fixed `getRecordFields` incorrectly ignoring default value

* Make sure Hudi metadata fields are also indexed
2022-03-20 10:24:13 +05:30
Alexey Kudinkin
099c2c099a [HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877)
Refactoring Spark DataSource Relations to avoid code duplication. 

Following Relations were in scope:

- BaseFileOnlyViewRelation
- MergeOnReadSnapshotRelaation
- MergeOnReadIncrementalRelation
2022-03-18 22:32:16 -07:00
Sivabalan Narayanan
316e38c71e [HUDI-3659] Reducing the validation frequency with integ tests (#5067) 2022-03-18 12:45:33 -04:00
Sivabalan Narayanan
2551c26183 [HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests (#5063) 2022-03-18 12:44:56 -04:00
JerryYue-M
6fe4d6e2f6 [HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator (#5049)
for chaining purpose

Co-authored-by: jerryyue <jerryyue@didiglobal.com>
2022-03-18 10:47:29 +08:00
RexAn
9ece77561a [MINOR] HoodieFileScanRDD could print null path (#5056)
Co-authored-by: Rex An <bonean131@gmail.com>
2022-03-17 12:53:45 -07:00
Raymond Xu
7446ff95a7 [HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors (#4856)
- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
2022-03-17 04:17:56 -07:00
冯健
bf191f8d46 [HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap (#5028)
- Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap
2022-03-17 14:20:28 +05:30
Y Ethan Guo
5ba2d9ab2f [HUDI-3494] Consider triggering condition of MOR compaction during archival (#4974) 2022-03-17 01:28:11 -04:00
Y Ethan Guo
95e6e53810 [HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode (#4975) 2022-03-17 01:25:04 -04:00
YueZhang
8ca9a54db0 [Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator (#4994)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-03-16 18:31:00 -07:00
that's cool
91849c3d66 [HUDI-3607] Support backend switch in HoodieFlinkStreamer (#5032)
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. checkstyle fix

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. change the msg
2022-03-16 10:07:31 +04:00
Y Ethan Guo
296a0e6bcf [HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image (#4997) 2022-03-15 18:49:30 -07:00
todd5167
55dca969f9 [HUDI-3589] flink sync hive metadata supports table properties and serde properties (#4995) 2022-03-15 23:56:37 +04:00
Sagar Sumit
d514570e90 [HUDI-3633] Allow non-string values to be set in TypedProperties (#5045)
* [HUDI-3633] Allow non-string values to be set in TypedProperties

* Override getProperty to ignore instanceof string check
2022-03-15 22:33:22 +04:00
Alexey Kudinkin
5e8ff8d793 [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948) 2022-03-15 10:38:36 -07:00
l-shen
9bdda2a312 [HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#5033)
Co-authored-by: root <l-shen@localhost.localdomain>
2022-03-15 16:34:31 +04:00
Thinking Chen
6ed7106e59 [HUDI-3606] Add org.objenesis:objenesis to hudi-timeline-server-bundle pom (#5017) 2022-03-15 15:06:50 +04:00
wangxianghu
3b59b76952 [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string (#4987)
* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

* add ut

* Address comment
2022-03-15 15:06:30 +04:00
Sivabalan Narayanan
d40adfa2d7 [HUDI-3620] Adding spark3.2.0 profile (#5038) 2022-03-14 19:14:00 -04:00
Sivabalan Narayanan
30cf39301e [HUDI-3623] Removing hive sync node from non hive yamls (#5040) 2022-03-14 18:39:26 -04:00
Sivabalan Narayanan
22c3ce73db [HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039) 2022-03-14 18:34:17 -04:00
wangxianghu
003c6ee73e [MINODR] Remove repeated kafka-clients dependencies (#5034) 2022-03-14 18:24:06 +04:00
peanut-chenzhong
4b75cb6f23 fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits (#4976)
* Update CompactionHoodiePathCommand.scala

fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits

* Update CompactionHoodiePathCommand.scala

fix IndexOutOfBoundsException when there`s no schedule for compaction

* Update CompactionHoodiePathCommand.scala

fix CI issue
2022-03-14 16:40:38 +08:00
Danny Chan
465d553df8 [HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink (#5010) 2022-03-14 14:22:07 +08:00
Sivabalan Narayanan
1ba8220617 [HUDI-3613] Adding/fixing yamls for metadata (#5029) 2022-03-13 21:11:37 -04:00
ForwardXu
6c8224cae6 [HUDI-3501] Support savepoints command based on Call Produce Command (#5025) 2022-03-13 16:58:21 +04:00
liujinhui
e60acc1258 [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException (#4984)
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-03-12 23:00:50 -08:00
Sagar Sumit
eee96e9af3 [HUDI-3593] Restore TypedProperties and flush checksum in table config (#5013)
Create new TypedProperties while performing clustering

Add OrderedProperties and minor refactoring

Add javadoc and remove getters from OrderedProperties
2022-03-13 07:58:55 +05:30
Sivabalan Narayanan
e7bb0413af [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way (#4971) 2022-03-11 18:40:13 -05:00
wangxianghu
e8918b6c2c [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at once (#4969) 2022-03-11 17:49:30 -05:00
RexAn
93277b2bcd [HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty (#4999)
Co-authored-by: Rex An <bonean131@gmail.com>
2022-03-11 17:45:40 -05:00
Alexey Kudinkin
5d59bf67ae [HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state (#5015) 2022-03-11 17:39:22 -05:00
huberylee
56cb49485d [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable (#4982) 2022-03-11 13:23:19 -08:00
wangxianghu
b00180342e [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor (#5019) 2022-03-11 15:03:42 +04:00
苏承祥
faed6996ee [HUDI-3566] Add thread factory in BoundedInMemoryExecutor (#4926)
Co-authored-by: 苏承祥 <sucx@tuya.com>
2022-03-11 18:58:49 +08:00