Bo Cui
a704e3740c
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink ( #5574 )
...
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
2022-05-13 19:52:55 +08:00
Bo Cui
7fb436d3cf
[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… ( #5545 )
...
* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files
2022-05-13 14:32:48 +08:00
Xingcan Cui
8ad0bb9745
[MINOR] Fix a NPE for Option ( #5461 )
2022-05-13 12:20:40 +08:00
Bo Cui
701f8c039d
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink ( #5528 )
...
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
2022-05-13 09:50:11 +08:00
Sivabalan Narayanan
0cec955fa2
[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests ( #5501 )
...
- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.
2022-05-12 21:01:55 -04:00
YueZhang
ecd47e7aae
[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. ( #5562 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-05-12 07:26:00 -04:00
Sivabalan Narayanan
b10ca7e69f
[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer ( #5559 )
2022-05-11 16:02:54 -04:00
Jin Xing
7f0c1f3ddf
[HUDI-4079] Supports showing table comment for hudi with spark3 ( #5546 )
2022-05-11 22:28:58 +08:00
Alexey Kudinkin
4a8589f222
[HUDI-4038] Avoid calling getDataSize after every record written ( #5497 )
...
- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-05-11 08:08:31 -04:00
4258a71517
[HUDI-4003] Try to read all the log file to parse schema ( #5473 )
2022-05-10 18:45:53 -04:00
aliceyyan
6fd21d0f10
[HUDI-4044] When reading data from flink-hudi to external storage, the … ( #5516 )
...
Co-authored-by: aliceyyan <aliceyyan@tencent.com >
2022-05-10 10:25:13 +08:00
Sivabalan Narayanan
6285a239a3
[HUDI-3995] Making perf optimizations for bulk insert row writer path ( #5462 )
...
- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.
2022-05-09 12:40:22 -04:00
xicm
6b47ef6ed2
[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… ( #5526 )
...
* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized
Co-authored-by: xicm <xicm@asiainfo.com >
2022-05-09 16:35:50 +08:00
ForwardXu
4c70840275
[MINOR] Fixing close for HoodieCatalog's test ( #5531 )
...
* [MINOR] Fixing close for HoodieCatalog's test
2022-05-09 15:17:24 +08:00
guanziyue
75eaa0bffe
[HUDI-4055]refactor ratelimiter to avoid stack overflow ( #5530 )
2022-05-09 10:27:37 +08:00
Sivabalan Narayanan
569a76a9a5
[MINOR] fixing flaky tests in deltastreamer tests ( #5521 )
2022-05-07 15:37:20 -04:00
BruceLin
80f99893a0
[MINOR] Fixing class not found when using flink and enable metadata table ( #5527 )
2022-05-07 20:03:18 +08:00
cxzl25
9625d16937
[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration ( #5287 )
2022-05-07 15:39:14 +08:00
Sivabalan Narayanan
52fe1c9fae
[HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode ( #5073 )
...
- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.
2022-05-06 09:27:29 -04:00
Raymond Xu
c319ee9cea
[HUDI-4017] Improve spark sql coverage in CI ( #5512 )
...
Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.
2022-05-06 05:52:06 -07:00
Jin Xing
248b0591b0
[HUDI-4042] Support truncate-partition for Spark-3.2 ( #5506 )
2022-05-06 00:29:47 -07:00
guanziyue
abb4893b25
[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully ( #4264 )
2022-05-05 13:49:34 -07:00
qianchutao
d794f4fbf9
[MINOR] Optimize code logic ( #5499 )
2022-05-05 09:33:06 -07:00
Y Ethan Guo
f66e83dc65
[HUDI-3667] Run unit tests of hudi-integ-tests in CI ( #5078 )
2022-05-04 23:39:18 -07:00
Sagar Sumit
1562bb658f
[HUDI-4031] Avoid clustering update handling when no pending replacecommit ( #5487 )
2022-05-04 10:17:11 -04:00
Raymond Xu
8c9209db28
[HUDI-4005] Update release scripts to help validation ( #5479 )
2022-05-04 10:15:54 -04:00
Sagar Sumit
3343cbb47b
[MINOR] Update RFC status ( #5486 )
2022-05-03 08:57:18 -07:00
Todd Gao
9732ba12da
[HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto ( #4563 )
...
* Add RFC doc
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
* Add note regarding catalog naming
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-05-02 22:05:23 +05:30
Raymond Xu
6af1ff7a66
[MINOR] Update DOAP for release 0.11.0 ( #5467 )
2022-04-30 10:51:16 -07:00
Wangyh
33ff4752ba
[HUDI-3978] Fix use of partition path field as hive partition field in flink ( #5434 )
...
* Fix partition path fields as hive sync partition fields error
2022-04-29 20:58:54 -07:00
xicm
f492c52ee4
[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig ( #5308 )
...
Co-authored-by: xicm <xicm@asiainfo.com >
2022-04-29 16:21:52 -07:00
Y Ethan Guo
a1d82b4dc5
[MINOR] Fix CI by ignoring SparkContext error ( #5468 )
...
Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers
2022-04-29 11:19:07 -07:00
吴祥平
e421d536ea
[HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index ( #5185 )
...
* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView
2022-04-29 14:10:20 +08:00
Gary Li
b27e8b51d8
[MINOR] support different cleaning policy for flink ( #5459 )
2022-04-29 09:48:44 +08:00
LiChuang
4e928a6fe1
[HUDI-3943] Some description fixes for 0.10.1 docs ( #5447 )
2022-04-28 15:18:56 -07:00
Ibson
52953c8f5e
[HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error ( #5368 )
...
Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com >
2022-04-27 16:09:44 -07:00
watermelon12138
cacbd98687
[HUDI-3945] After the async compaction operation is complete, the task should exit. ( #5391 )
...
Co-authored-by: y00617041 <yangxuan42@huawei.com >
2022-04-27 21:16:09 +08:00
huberylee
924e2e96a6
Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance ( #5441 )
2022-04-27 14:07:29 +08:00
Danny Chan
e1ccf2e00b
[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException ( #5432 )
2022-04-27 13:19:55 +08:00
KnightChess
6ec039ba42
[MINOR] Update alter rename command class type for pattern matching ( #5381 )
2022-04-26 19:39:51 -07:00
Yann Byron
77e333298d
[HUDI-3478] Claim RFC 51 For CDC ( #5437 )
2022-04-26 20:56:47 +05:30
Sivabalan Narayanan
762623a15c
[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes ( #5424 )
...
Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.
2022-04-25 23:03:10 -04:00
Yuwei XIAO
f2ba0fead2
[HUDI-3085] Improve bulk insert partitioner abstraction ( #4441 )
2022-04-25 18:42:17 +08:00
ForwardXu
9054b85961
Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for flink-hudi ( #5405 )" ( #5421 )
...
This reverts commit bda3db078e .
2022-04-25 12:58:27 +08:00
Ruguo Yu
d994c58cc0
[HUDI-3946] Validate option path in flink hudi sink ( #5397 )
2022-04-25 10:13:47 +08:00
hehuiyuan
bda3db078e
support generan parameter 'sink.parallelism' for flink-hudi ( #5405 )
...
Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com >
2022-04-24 19:09:39 +08:00
miomiocat
5e5c177e4b
[HUDI-3923] Fix cast exception while reading boolean type of partitioned field ( #5373 )
2022-04-23 20:12:54 +08:00
Y Ethan Guo
8633bd6e06
[HUDI-3948] Fix presto bundle missing HBase classes ( #5398 )
2022-04-23 01:33:55 -07:00
Raymond Xu
505ee672ac
[HUDI-3950] add parquet-avro to gcp-bundle ( #5399 )
2022-04-23 11:59:49 +08:00
Sivabalan Narayanan
7523542c1d
[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter ( #5401 )
2022-04-22 22:20:05 -04:00