ForwardXu
0802510ca9
[HUDI-2520] Fix drop partition issue when sync to hive ( #5147 )
2022-03-29 11:28:19 -07:00
Alexey Kudinkin
fcb003ec76
[HUDI-3731] Fixing Column Stats Index record Merging sequence missing columnName ( #5159 )
...
* Added `DataSkippingFailureMode` to control how DS handles failures in the flow (either "strict", when exception would be thrown, or "fallback" when it will just fallback to the full-scan)
* Make sure tests execute in `DataSkippingFailureMode.Strict`
* Fixed Column Stats Index record merging sequence missing `columnName`
2022-03-29 21:09:56 +05:30
Raymond Xu
1b2fb71afc
[MINOR] Move Experiemental to javadoc ( #5161 )
2022-03-28 21:07:59 -07:00
Nicolas Paris
7c7ecb11d5
[HUDI-3736] Fix default dynamodblock url default value ( #4967 )
2022-03-28 20:31:46 -07:00
leesf
8f8a8158e2
[HUDI-2520] Fix drop table issue when sync to Hive ( #5143 )
2022-03-28 19:34:12 -07:00
Danny Chan
3bf9c5ffe8
[HUDI-3728] Set the sort operator parallelism for flink bucket bulk insert ( #5154 )
2022-03-29 09:52:35 +08:00
ForwardXu
72e0b52b18
[HUDI-3722] Fix truncate hudi table's error ( #5140 )
2022-03-29 09:44:18 +08:00
Sivabalan Narayanan
d074089c62
[HUDI-2566] Adding multi-writer test support to integ test ( #5065 )
2022-03-28 17:05:00 -04:00
Raymond Xu
6ccbae4d2a
[HUDI-2757] Implement Hudi AWS Glue sync ( #5076 )
2022-03-28 14:54:59 -04:00
Y Ethan Guo
4ed84b216d
[HUDI-3720] Fix the logic of reattempting pending rollback ( #5148 )
2022-03-28 14:54:31 -04:00
Shawy Geng
2e2d08cb72
[HUDI-3539] Flink bucket index bucketID bootstrap optimization. ( #5093 )
...
* [HUDI-3539] Flink bucket index bucketID bootstrap optimization.
Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com >
2022-03-28 19:50:36 +08:00
huberylee
1d0f4ccfe0
[HUDI-3538] Support Compaction Command Based on Call Procedure Command for Spark SQL ( #4945 )
...
* Support Compaction Command Based on Call Procedure Command for Spark SQL
* Addressed review comments
2022-03-28 14:11:35 +08:00
ForwardXu
d31cde284c
[MINOR] Fix call command parser use spark3.2 ( #5144 )
2022-03-28 11:13:44 +08:00
Sivabalan Narayanan
f2a93ead3b
[HUDI-3724] Fixing closure of ParquetReader ( #5141 )
2022-03-28 09:36:15 +08:00
xiarixiaoyao
9da2dd416e
[HUDI-3719] High performance costs of AvroSerizlizer in DataSource wr… ( #5137 )
...
* [HUDI-3719] High performance costs of AvroSerizlizer in DataSource writing
* add benchmark framework which modify from spark
add avroSerDerBenchmark
2022-03-27 11:01:43 -07:00
Sivabalan Narayanan
85c4a6cfc1
[MINOR] Relaxing cleaner and archival configs ( #5142 )
2022-03-27 12:26:24 -04:00
Y Ethan Guo
484b3407e0
[HUDI-3604] Adjust the order of timeline changes in rollbacks ( #5114 )
2022-03-26 22:37:44 -07:00
Danny Chan
4d940bbf8a
[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BUCKET index ( #5135 )
2022-03-27 09:13:58 +08:00
Alexey Kudinkin
189d5297b8
[HUDI-3709] Fixing ParquetWriter impls not respecting Parquet Max File Size limit ( #5129 )
2022-03-26 17:51:36 -04:00
RexAn
57b4f39c31
[HUDI-3612] Clustering strategy should create new TypedProperties when modifying it ( #5027 )
2022-03-26 16:16:03 +05:30
Danny Chan
0c09a973fb
[HUDI-3435] Do not throw exception when instant to rollback does not exist in metadata table active timeline ( #4821 )
2022-03-26 11:42:54 +08:00
Alexey Kudinkin
51034fecf1
[HUDI-3396] Refactoring MergeOnReadRDD to avoid duplication, fetch only projected columns ( #4888 )
2022-03-25 09:32:03 -07:00
ForwardXu
12cc8e715b
[MINOR] fix QuickstartUtils move ( #5133 )
2022-03-25 07:34:35 -07:00
ForwardXu
e5c3f9089b
[HUDI-3563] Make quickstart examples covered by CI tests ( #5082 )
2022-03-25 01:37:17 -07:00
wangxianghu
f20c9867d7
[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PRECOMBINE_FIELD_TYPE_PROP ( #5096 )
2022-03-25 00:02:54 -07:00
Alexey Kudinkin
8b38ddedc2
[HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping flow ( #4996 )
2022-03-24 22:27:15 -07:00
Danny Chan
8896864d7b
[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadata' is true ( #5088 )
2022-03-25 11:48:50 +08:00
Surya Prasanna
2fd9a4de5c
[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC ( #5128 )
2022-03-24 20:26:04 -07:00
Zhaojing Yu
483ee843e6
[HUDI-3703] Reset taskID in restoreWriteMetadata ( #5122 )
2022-03-25 10:18:28 +08:00
Y Ethan Guo
eaa4c4f2e2
[HUDI-1180] Upgrade HBase to 2.4.9 ( #5004 )
...
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-03-24 19:04:53 -07:00
Danny Chan
5e86cdd1e9
[HUDI-3701] Flink bulk_insert support bucket hash index ( #5118 )
2022-03-25 09:01:42 +08:00
Y Ethan Guo
608d4bf32d
[HUDI-3638] Make ZookeeperBasedLockProvider serializable ( #5112 )
2022-03-24 17:59:47 -07:00
Y Ethan Guo
9b3dd2e0b7
[HUDI-3624] Check all instants before starting a commit in metadata table ( #5098 )
2022-03-24 17:13:58 -07:00
Y Ethan Guo
4ddd094ba2
[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer ( #5127 )
2022-03-24 16:42:44 -07:00
Raymond Xu
ff136658a0
[HUDI-3689] Fix delta streamer tests ( #5124 )
2022-03-24 14:19:53 -07:00
Y Ethan Guo
44ab3b73ed
[HUDI-3706] Downgrade maven surefire and failsafe version ( #5123 )
2022-03-24 09:31:46 -07:00
Raymond Xu
686da41696
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer ( #5120 )
2022-03-24 09:10:33 -07:00
Raymond Xu
b14706502b
[HUDI-3689] Remove Azure CI cache ( #5121 )
2022-03-24 05:39:11 -07:00
Alexey Kudinkin
ccc3728002
[HUDI-3684] Fixing NPE in ParquetUtils ( #5102 )
...
* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`
2022-03-24 17:37:38 +05:30
Sagar Sumit
fe2c3989e3
[HUDI-3689] Fix glob path and hive sync in deltastreamer tests ( #5117 )
...
* Remove glob pattern basePath from the deltastreamer tests.
* [HUDI-3689] Fix file scheme config
for CI failure in TestHoodieRealTimeRecordReader
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-24 15:48:35 +05:30
Danny Chan
a1c42fcc07
[minor] Checks the data block type for archived timeline ( #5106 )
2022-03-24 14:10:43 +08:00
Sivabalan Narayanan
52f0498330
Fixing non partitioned all files record in MDT ( #5108 )
2022-03-23 19:26:39 -07:00
Sagar Sumit
f96ba7abf0
[HUDI-3642] Handle NPE due to empty requested replacecommit metadata ( #5090 )
2022-03-23 12:13:02 -07:00
Rajesh Mahindra
5f570ea151
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs ( #4175 )
...
- Refactor hive sync tool / config to use reflection and standardize configs
Co-authored-by: sivabalan <n.siva.b@gmail.com >
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-03-21 22:56:31 -04:00
Y Ethan Guo
9b6e138af2
[HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine ( #5075 )
2022-03-21 20:35:06 -04:00
Pratyaksh Sharma
ca0931d332
[HUDI-1436]: Provide an option to trigger clean every nth commit ( #4385 )
...
- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-03-21 20:06:30 -04:00
wxp4532
26e5d2e6fc
[HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
...
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.
close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Sivabalan Narayanan
a118d56b07
[MINOR] Fixing sparkUpdateNode for record generation ( #5079 )
2022-03-21 00:56:30 -04:00
Danny Chan
799c78e688
[HUDI-3665] Support flink multiple versions ( #5072 )
2022-03-21 10:34:50 +08:00
Y Ethan Guo
15d1c18625
[MINOR] Remove flaky assert in TestInLineFileSystem ( #5069 )
2022-03-20 18:58:30 -04:00