1
0
Commit Graph

2631 Commits

Author SHA1 Message Date
Alexey Kudinkin
8b38ddedc2 [HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping flow (#4996) 2022-03-24 22:27:15 -07:00
Danny Chan
8896864d7b [HUDI-3678] Fix record rewrite of create handle when 'preserveMetadata' is true (#5088) 2022-03-25 11:48:50 +08:00
Surya Prasanna
2fd9a4de5c [HUDI-3580] Claim RFC number 48 for LogCompaction action RFC (#5128) 2022-03-24 20:26:04 -07:00
Zhaojing Yu
483ee843e6 [HUDI-3703] Reset taskID in restoreWriteMetadata (#5122) 2022-03-25 10:18:28 +08:00
Y Ethan Guo
eaa4c4f2e2 [HUDI-1180] Upgrade HBase to 2.4.9 (#5004)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-03-24 19:04:53 -07:00
Danny Chan
5e86cdd1e9 [HUDI-3701] Flink bulk_insert support bucket hash index (#5118) 2022-03-25 09:01:42 +08:00
Y Ethan Guo
608d4bf32d [HUDI-3638] Make ZookeeperBasedLockProvider serializable (#5112) 2022-03-24 17:59:47 -07:00
Y Ethan Guo
9b3dd2e0b7 [HUDI-3624] Check all instants before starting a commit in metadata table (#5098) 2022-03-24 17:13:58 -07:00
Y Ethan Guo
4ddd094ba2 [HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer (#5127) 2022-03-24 16:42:44 -07:00
Raymond Xu
ff136658a0 [HUDI-3689] Fix delta streamer tests (#5124) 2022-03-24 14:19:53 -07:00
Y Ethan Guo
44ab3b73ed [HUDI-3706] Downgrade maven surefire and failsafe version (#5123) 2022-03-24 09:31:46 -07:00
Raymond Xu
686da41696 [HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120) 2022-03-24 09:10:33 -07:00
Raymond Xu
b14706502b [HUDI-3689] Remove Azure CI cache (#5121) 2022-03-24 05:39:11 -07:00
Alexey Kudinkin
ccc3728002 [HUDI-3684] Fixing NPE in ParquetUtils (#5102)
* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`
2022-03-24 17:37:38 +05:30
Sagar Sumit
fe2c3989e3 [HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)
* Remove glob pattern basePath from the deltastreamer tests.

* [HUDI-3689] Fix file scheme config

for CI failure in TestHoodieRealTimeRecordReader

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-03-24 15:48:35 +05:30
Danny Chan
a1c42fcc07 [minor] Checks the data block type for archived timeline (#5106) 2022-03-24 14:10:43 +08:00
Sivabalan Narayanan
52f0498330 Fixing non partitioned all files record in MDT (#5108) 2022-03-23 19:26:39 -07:00
Sagar Sumit
f96ba7abf0 [HUDI-3642] Handle NPE due to empty requested replacecommit metadata (#5090) 2022-03-23 12:13:02 -07:00
Rajesh Mahindra
5f570ea151 [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175)
- Refactor hive sync tool / config to use reflection and standardize configs

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-03-21 22:56:31 -04:00
Y Ethan Guo
9b6e138af2 [HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine (#5075) 2022-03-21 20:35:06 -04:00
Pratyaksh Sharma
ca0931d332 [HUDI-1436]: Provide an option to trigger clean every nth commit (#4385)
- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-03-21 20:06:30 -04:00
wxp4532
26e5d2e6fc [HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.

close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Sivabalan Narayanan
a118d56b07 [MINOR] Fixing sparkUpdateNode for record generation (#5079) 2022-03-21 00:56:30 -04:00
Danny Chan
799c78e688 [HUDI-3665] Support flink multiple versions (#5072) 2022-03-21 10:34:50 +08:00
Y Ethan Guo
15d1c18625 [MINOR] Remove flaky assert in TestInLineFileSystem (#5069) 2022-03-20 18:58:30 -04:00
Alexey Kudinkin
1b6e201160 [HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit (#5070)
* Fixed metadata conversion util to extract schema from `HoodieCommitMetadata`

* Fixed failure to fetch columns to index in empty table

* Abort indexing seq in case there are no columns to index

* Fallback to index at least primary key columns, in case no writer schema could be obtained to index all columns

* Fixed `getRecordFields` incorrectly ignoring default value

* Make sure Hudi metadata fields are also indexed
2022-03-20 10:24:13 +05:30
Alexey Kudinkin
099c2c099a [HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877)
Refactoring Spark DataSource Relations to avoid code duplication. 

Following Relations were in scope:

- BaseFileOnlyViewRelation
- MergeOnReadSnapshotRelaation
- MergeOnReadIncrementalRelation
2022-03-18 22:32:16 -07:00
Sivabalan Narayanan
316e38c71e [HUDI-3659] Reducing the validation frequency with integ tests (#5067) 2022-03-18 12:45:33 -04:00
Sivabalan Narayanan
2551c26183 [HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests (#5063) 2022-03-18 12:44:56 -04:00
JerryYue-M
6fe4d6e2f6 [HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator (#5049)
for chaining purpose

Co-authored-by: jerryyue <jerryyue@didiglobal.com>
2022-03-18 10:47:29 +08:00
RexAn
9ece77561a [MINOR] HoodieFileScanRDD could print null path (#5056)
Co-authored-by: Rex An <bonean131@gmail.com>
2022-03-17 12:53:45 -07:00
Raymond Xu
7446ff95a7 [HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors (#4856)
- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
2022-03-17 04:17:56 -07:00
冯健
bf191f8d46 [HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap (#5028)
- Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap
2022-03-17 14:20:28 +05:30
Y Ethan Guo
5ba2d9ab2f [HUDI-3494] Consider triggering condition of MOR compaction during archival (#4974) 2022-03-17 01:28:11 -04:00
Y Ethan Guo
95e6e53810 [HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode (#4975) 2022-03-17 01:25:04 -04:00
YueZhang
8ca9a54db0 [Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator (#4994)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-03-16 18:31:00 -07:00
that's cool
91849c3d66 [HUDI-3607] Support backend switch in HoodieFlinkStreamer (#5032)
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. checkstyle fix

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. change the msg
2022-03-16 10:07:31 +04:00
Y Ethan Guo
296a0e6bcf [HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image (#4997) 2022-03-15 18:49:30 -07:00
todd5167
55dca969f9 [HUDI-3589] flink sync hive metadata supports table properties and serde properties (#4995) 2022-03-15 23:56:37 +04:00
Sagar Sumit
d514570e90 [HUDI-3633] Allow non-string values to be set in TypedProperties (#5045)
* [HUDI-3633] Allow non-string values to be set in TypedProperties

* Override getProperty to ignore instanceof string check
2022-03-15 22:33:22 +04:00
Alexey Kudinkin
5e8ff8d793 [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948) 2022-03-15 10:38:36 -07:00
l-shen
9bdda2a312 [HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#5033)
Co-authored-by: root <l-shen@localhost.localdomain>
2022-03-15 16:34:31 +04:00
Thinking Chen
6ed7106e59 [HUDI-3606] Add org.objenesis:objenesis to hudi-timeline-server-bundle pom (#5017) 2022-03-15 15:06:50 +04:00
wangxianghu
3b59b76952 [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string (#4987)
* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

* add ut

* Address comment
2022-03-15 15:06:30 +04:00
Sivabalan Narayanan
d40adfa2d7 [HUDI-3620] Adding spark3.2.0 profile (#5038) 2022-03-14 19:14:00 -04:00
Sivabalan Narayanan
30cf39301e [HUDI-3623] Removing hive sync node from non hive yamls (#5040) 2022-03-14 18:39:26 -04:00
Sivabalan Narayanan
22c3ce73db [HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039) 2022-03-14 18:34:17 -04:00
wangxianghu
003c6ee73e [MINODR] Remove repeated kafka-clients dependencies (#5034) 2022-03-14 18:24:06 +04:00
peanut-chenzhong
4b75cb6f23 fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits (#4976)
* Update CompactionHoodiePathCommand.scala

fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits

* Update CompactionHoodiePathCommand.scala

fix IndexOutOfBoundsException when there`s no schedule for compaction

* Update CompactionHoodiePathCommand.scala

fix CI issue
2022-03-14 16:40:38 +08:00
Danny Chan
465d553df8 [HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink (#5010) 2022-03-14 14:22:07 +08:00