Raymond Xu
686da41696
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer ( #5120 )
2022-03-24 09:10:33 -07:00
Raymond Xu
b14706502b
[HUDI-3689] Remove Azure CI cache ( #5121 )
2022-03-24 05:39:11 -07:00
Alexey Kudinkin
ccc3728002
[HUDI-3684] Fixing NPE in ParquetUtils ( #5102 )
...
* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`
2022-03-24 17:37:38 +05:30
Sagar Sumit
fe2c3989e3
[HUDI-3689] Fix glob path and hive sync in deltastreamer tests ( #5117 )
...
* Remove glob pattern basePath from the deltastreamer tests.
* [HUDI-3689] Fix file scheme config
for CI failure in TestHoodieRealTimeRecordReader
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-24 15:48:35 +05:30
Danny Chan
a1c42fcc07
[minor] Checks the data block type for archived timeline ( #5106 )
2022-03-24 14:10:43 +08:00
Sivabalan Narayanan
52f0498330
Fixing non partitioned all files record in MDT ( #5108 )
2022-03-23 19:26:39 -07:00
Sagar Sumit
f96ba7abf0
[HUDI-3642] Handle NPE due to empty requested replacecommit metadata ( #5090 )
2022-03-23 12:13:02 -07:00
Rajesh Mahindra
5f570ea151
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs ( #4175 )
...
- Refactor hive sync tool / config to use reflection and standardize configs
Co-authored-by: sivabalan <n.siva.b@gmail.com >
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-21 22:56:31 -04:00
Y Ethan Guo
9b6e138af2
[HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine ( #5075 )
2022-03-21 20:35:06 -04:00
Pratyaksh Sharma
ca0931d332
[HUDI-1436]: Provide an option to trigger clean every nth commit ( #4385 )
...
- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-03-21 20:06:30 -04:00
wxp4532
26e5d2e6fc
[HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
...
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.
close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Sivabalan Narayanan
a118d56b07
[MINOR] Fixing sparkUpdateNode for record generation ( #5079 )
2022-03-21 00:56:30 -04:00
Danny Chan
799c78e688
[HUDI-3665] Support flink multiple versions ( #5072 )
2022-03-21 10:34:50 +08:00
Y Ethan Guo
15d1c18625
[MINOR] Remove flaky assert in TestInLineFileSystem ( #5069 )
2022-03-20 18:58:30 -04:00
Alexey Kudinkin
1b6e201160
[HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit ( #5070 )
...
* Fixed metadata conversion util to extract schema from `HoodieCommitMetadata`
* Fixed failure to fetch columns to index in empty table
* Abort indexing seq in case there are no columns to index
* Fallback to index at least primary key columns, in case no writer schema could be obtained to index all columns
* Fixed `getRecordFields` incorrectly ignoring default value
* Make sure Hudi metadata fields are also indexed
2022-03-20 10:24:13 +05:30
Alexey Kudinkin
099c2c099a
[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication ( #4877 )
...
Refactoring Spark DataSource Relations to avoid code duplication.
Following Relations were in scope:
- BaseFileOnlyViewRelation
- MergeOnReadSnapshotRelaation
- MergeOnReadIncrementalRelation
2022-03-18 22:32:16 -07:00
Sivabalan Narayanan
316e38c71e
[HUDI-3659] Reducing the validation frequency with integ tests ( #5067 )
2022-03-18 12:45:33 -04:00
Sivabalan Narayanan
2551c26183
[HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests ( #5063 )
2022-03-18 12:44:56 -04:00
JerryYue-M
6fe4d6e2f6
[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator ( #5049 )
...
for chaining purpose
Co-authored-by: jerryyue <jerryyue@didiglobal.com >
2022-03-18 10:47:29 +08:00
RexAn
9ece77561a
[MINOR] HoodieFileScanRDD could print null path ( #5056 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-17 12:53:45 -07:00
Raymond Xu
7446ff95a7
[HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors ( #4856 )
...
- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
2022-03-17 04:17:56 -07:00
冯健
bf191f8d46
[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap ( #5028 )
...
- Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap
2022-03-17 14:20:28 +05:30
Y Ethan Guo
5ba2d9ab2f
[HUDI-3494] Consider triggering condition of MOR compaction during archival ( #4974 )
2022-03-17 01:28:11 -04:00
Y Ethan Guo
95e6e53810
[HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode ( #4975 )
2022-03-17 01:25:04 -04:00
YueZhang
8ca9a54db0
[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator ( #4994 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-03-16 18:31:00 -07:00
that's cool
91849c3d66
[HUDI-3607] Support backend switch in HoodieFlinkStreamer ( #5032 )
...
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. checkstyle fix
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. change the msg
2022-03-16 10:07:31 +04:00
Y Ethan Guo
296a0e6bcf
[HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image ( #4997 )
2022-03-15 18:49:30 -07:00
todd5167
55dca969f9
[HUDI-3589] flink sync hive metadata supports table properties and serde properties ( #4995 )
2022-03-15 23:56:37 +04:00
Sagar Sumit
d514570e90
[HUDI-3633] Allow non-string values to be set in TypedProperties ( #5045 )
...
* [HUDI-3633] Allow non-string values to be set in TypedProperties
* Override getProperty to ignore instanceof string check
2022-03-15 22:33:22 +04:00
Alexey Kudinkin
5e8ff8d793
[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index ( #4948 )
2022-03-15 10:38:36 -07:00
l-shen
9bdda2a312
[HUDI-3619] Fix HoodieOperation fromValue using wrong constant value ( #5033 )
...
Co-authored-by: root <l-shen@localhost.localdomain >
2022-03-15 16:34:31 +04:00
Thinking Chen
6ed7106e59
[HUDI-3606] Add org.objenesis:objenesis to hudi-timeline-server-bundle pom ( #5017 )
2022-03-15 15:06:50 +04:00
wangxianghu
3b59b76952
[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string ( #4987 )
...
* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string
* add ut
* Address comment
2022-03-15 15:06:30 +04:00
Sivabalan Narayanan
d40adfa2d7
[HUDI-3620] Adding spark3.2.0 profile ( #5038 )
2022-03-14 19:14:00 -04:00
Sivabalan Narayanan
30cf39301e
[HUDI-3623] Removing hive sync node from non hive yamls ( #5040 )
2022-03-14 18:39:26 -04:00
Sivabalan Narayanan
22c3ce73db
[HUDI-3621] Fixing NullPointerException in DeltaStreamer ( #5039 )
2022-03-14 18:34:17 -04:00
wangxianghu
003c6ee73e
[MINODR] Remove repeated kafka-clients dependencies ( #5034 )
2022-03-14 18:24:06 +04:00
peanut-chenzhong
4b75cb6f23
fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits ( #4976 )
...
* Update CompactionHoodiePathCommand.scala
fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits
* Update CompactionHoodiePathCommand.scala
fix IndexOutOfBoundsException when there`s no schedule for compaction
* Update CompactionHoodiePathCommand.scala
fix CI issue
2022-03-14 16:40:38 +08:00
Danny Chan
465d553df8
[HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink ( #5010 )
2022-03-14 14:22:07 +08:00
Sivabalan Narayanan
1ba8220617
[HUDI-3613] Adding/fixing yamls for metadata ( #5029 )
2022-03-13 21:11:37 -04:00
ForwardXu
6c8224cae6
[HUDI-3501] Support savepoints command based on Call Produce Command ( #5025 )
2022-03-13 16:58:21 +04:00
liujinhui
e60acc1258
[HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException ( #4984 )
...
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-03-12 23:00:50 -08:00
Sagar Sumit
eee96e9af3
[HUDI-3593] Restore TypedProperties and flush checksum in table config ( #5013 )
...
Create new TypedProperties while performing clustering
Add OrderedProperties and minor refactoring
Add javadoc and remove getters from OrderedProperties
2022-03-13 07:58:55 +05:30
Sivabalan Narayanan
e7bb0413af
[HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way ( #4971 )
2022-03-11 18:40:13 -05:00
wangxianghu
e8918b6c2c
[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at once ( #4969 )
2022-03-11 17:49:30 -05:00
RexAn
93277b2bcd
[HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty ( #4999 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-11 17:45:40 -05:00
Alexey Kudinkin
5d59bf67ae
[HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state ( #5015 )
2022-03-11 17:39:22 -05:00
huberylee
56cb49485d
[HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable ( #4982 )
2022-03-11 13:23:19 -08:00
wangxianghu
b00180342e
[HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor ( #5019 )
2022-03-11 15:03:42 +04:00
苏承祥
faed6996ee
[HUDI-3566] Add thread factory in BoundedInMemoryExecutor ( #4926 )
...
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-03-11 18:58:49 +08:00