Sivabalan Narayanan
85c4a6cfc1
[MINOR] Relaxing cleaner and archival configs ( #5142 )
2022-03-27 12:26:24 -04:00
Y Ethan Guo
484b3407e0
[HUDI-3604] Adjust the order of timeline changes in rollbacks ( #5114 )
2022-03-26 22:37:44 -07:00
Danny Chan
4d940bbf8a
[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BUCKET index ( #5135 )
2022-03-27 09:13:58 +08:00
Alexey Kudinkin
189d5297b8
[HUDI-3709] Fixing ParquetWriter impls not respecting Parquet Max File Size limit ( #5129 )
2022-03-26 17:51:36 -04:00
RexAn
57b4f39c31
[HUDI-3612] Clustering strategy should create new TypedProperties when modifying it ( #5027 )
2022-03-26 16:16:03 +05:30
Danny Chan
0c09a973fb
[HUDI-3435] Do not throw exception when instant to rollback does not exist in metadata table active timeline ( #4821 )
2022-03-26 11:42:54 +08:00
Alexey Kudinkin
51034fecf1
[HUDI-3396] Refactoring MergeOnReadRDD to avoid duplication, fetch only projected columns ( #4888 )
2022-03-25 09:32:03 -07:00
ForwardXu
12cc8e715b
[MINOR] fix QuickstartUtils move ( #5133 )
2022-03-25 07:34:35 -07:00
ForwardXu
e5c3f9089b
[HUDI-3563] Make quickstart examples covered by CI tests ( #5082 )
2022-03-25 01:37:17 -07:00
wangxianghu
f20c9867d7
[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PRECOMBINE_FIELD_TYPE_PROP ( #5096 )
2022-03-25 00:02:54 -07:00
Alexey Kudinkin
8b38ddedc2
[HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping flow ( #4996 )
2022-03-24 22:27:15 -07:00
Danny Chan
8896864d7b
[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadata' is true ( #5088 )
2022-03-25 11:48:50 +08:00
Surya Prasanna
2fd9a4de5c
[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC ( #5128 )
2022-03-24 20:26:04 -07:00
Zhaojing Yu
483ee843e6
[HUDI-3703] Reset taskID in restoreWriteMetadata ( #5122 )
2022-03-25 10:18:28 +08:00
Y Ethan Guo
eaa4c4f2e2
[HUDI-1180] Upgrade HBase to 2.4.9 ( #5004 )
...
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-03-24 19:04:53 -07:00
Danny Chan
5e86cdd1e9
[HUDI-3701] Flink bulk_insert support bucket hash index ( #5118 )
2022-03-25 09:01:42 +08:00
Y Ethan Guo
608d4bf32d
[HUDI-3638] Make ZookeeperBasedLockProvider serializable ( #5112 )
2022-03-24 17:59:47 -07:00
Y Ethan Guo
9b3dd2e0b7
[HUDI-3624] Check all instants before starting a commit in metadata table ( #5098 )
2022-03-24 17:13:58 -07:00
Y Ethan Guo
4ddd094ba2
[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer ( #5127 )
2022-03-24 16:42:44 -07:00
Raymond Xu
ff136658a0
[HUDI-3689] Fix delta streamer tests ( #5124 )
2022-03-24 14:19:53 -07:00
Y Ethan Guo
44ab3b73ed
[HUDI-3706] Downgrade maven surefire and failsafe version ( #5123 )
2022-03-24 09:31:46 -07:00
Raymond Xu
686da41696
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer ( #5120 )
2022-03-24 09:10:33 -07:00
Raymond Xu
b14706502b
[HUDI-3689] Remove Azure CI cache ( #5121 )
2022-03-24 05:39:11 -07:00
Alexey Kudinkin
ccc3728002
[HUDI-3684] Fixing NPE in ParquetUtils ( #5102 )
...
* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`
2022-03-24 17:37:38 +05:30
Sagar Sumit
fe2c3989e3
[HUDI-3689] Fix glob path and hive sync in deltastreamer tests ( #5117 )
...
* Remove glob pattern basePath from the deltastreamer tests.
* [HUDI-3689] Fix file scheme config
for CI failure in TestHoodieRealTimeRecordReader
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-24 15:48:35 +05:30
Danny Chan
a1c42fcc07
[minor] Checks the data block type for archived timeline ( #5106 )
2022-03-24 14:10:43 +08:00
Sivabalan Narayanan
52f0498330
Fixing non partitioned all files record in MDT ( #5108 )
2022-03-23 19:26:39 -07:00
Sagar Sumit
f96ba7abf0
[HUDI-3642] Handle NPE due to empty requested replacecommit metadata ( #5090 )
2022-03-23 12:13:02 -07:00
Rajesh Mahindra
5f570ea151
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs ( #4175 )
...
- Refactor hive sync tool / config to use reflection and standardize configs
Co-authored-by: sivabalan <n.siva.b@gmail.com >
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-21 22:56:31 -04:00
Y Ethan Guo
9b6e138af2
[HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine ( #5075 )
2022-03-21 20:35:06 -04:00
Pratyaksh Sharma
ca0931d332
[HUDI-1436]: Provide an option to trigger clean every nth commit ( #4385 )
...
- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-03-21 20:06:30 -04:00
wxp4532
26e5d2e6fc
[HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
...
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.
close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Sivabalan Narayanan
a118d56b07
[MINOR] Fixing sparkUpdateNode for record generation ( #5079 )
2022-03-21 00:56:30 -04:00
Danny Chan
799c78e688
[HUDI-3665] Support flink multiple versions ( #5072 )
2022-03-21 10:34:50 +08:00
Y Ethan Guo
15d1c18625
[MINOR] Remove flaky assert in TestInLineFileSystem ( #5069 )
2022-03-20 18:58:30 -04:00
Alexey Kudinkin
1b6e201160
[HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit ( #5070 )
...
* Fixed metadata conversion util to extract schema from `HoodieCommitMetadata`
* Fixed failure to fetch columns to index in empty table
* Abort indexing seq in case there are no columns to index
* Fallback to index at least primary key columns, in case no writer schema could be obtained to index all columns
* Fixed `getRecordFields` incorrectly ignoring default value
* Make sure Hudi metadata fields are also indexed
2022-03-20 10:24:13 +05:30
Alexey Kudinkin
099c2c099a
[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication ( #4877 )
...
Refactoring Spark DataSource Relations to avoid code duplication.
Following Relations were in scope:
- BaseFileOnlyViewRelation
- MergeOnReadSnapshotRelaation
- MergeOnReadIncrementalRelation
2022-03-18 22:32:16 -07:00
Sivabalan Narayanan
316e38c71e
[HUDI-3659] Reducing the validation frequency with integ tests ( #5067 )
2022-03-18 12:45:33 -04:00
Sivabalan Narayanan
2551c26183
[HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests ( #5063 )
2022-03-18 12:44:56 -04:00
JerryYue-M
6fe4d6e2f6
[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator ( #5049 )
...
for chaining purpose
Co-authored-by: jerryyue <jerryyue@didiglobal.com >
2022-03-18 10:47:29 +08:00
RexAn
9ece77561a
[MINOR] HoodieFileScanRDD could print null path ( #5056 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-17 12:53:45 -07:00
Raymond Xu
7446ff95a7
[HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors ( #4856 )
...
- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
2022-03-17 04:17:56 -07:00
冯健
bf191f8d46
[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap ( #5028 )
...
- Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap
2022-03-17 14:20:28 +05:30
Y Ethan Guo
5ba2d9ab2f
[HUDI-3494] Consider triggering condition of MOR compaction during archival ( #4974 )
2022-03-17 01:28:11 -04:00
Y Ethan Guo
95e6e53810
[HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode ( #4975 )
2022-03-17 01:25:04 -04:00
YueZhang
8ca9a54db0
[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator ( #4994 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-03-16 18:31:00 -07:00
that's cool
91849c3d66
[HUDI-3607] Support backend switch in HoodieFlinkStreamer ( #5032 )
...
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. checkstyle fix
* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. change the msg
2022-03-16 10:07:31 +04:00
Y Ethan Guo
296a0e6bcf
[HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image ( #4997 )
2022-03-15 18:49:30 -07:00
todd5167
55dca969f9
[HUDI-3589] flink sync hive metadata supports table properties and serde properties ( #4995 )
2022-03-15 23:56:37 +04:00
Sagar Sumit
d514570e90
[HUDI-3633] Allow non-string values to be set in TypedProperties ( #5045 )
...
* [HUDI-3633] Allow non-string values to be set in TypedProperties
* Override getProperty to ignore instanceof string check
2022-03-15 22:33:22 +04:00