1
0
Commit Graph

3007 Commits

Author SHA1 Message Date
BruceLin
99555c897a [HUDI-4110] Clean the marker files for flink compaction (#5604) 2022-05-17 21:09:27 +08:00
Jin Xing
d422f69a0d [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (#5564)
* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand

* Set hoodie.query.as.ro.table in serde properties
2022-05-17 14:12:50 +08:00
Danny Chan
d52d13302d [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (#5590) 2022-05-17 10:34:57 +08:00
Danny Chan
fdd96cc97e [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (#5594) 2022-05-17 10:34:15 +08:00
Shawy Geng
ad773b3d96 [HUDI-3654] Preparations for hudi metastore. (#5572)
* [HUDI-3654] Preparations for hudi metastore.

Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>
2022-05-17 09:47:10 +08:00
董可伦
a7a42e4490 [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL 2022-05-16 23:26:23 +08:00
Danny Chan
43e08193ef [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (#5583) 2022-05-16 17:40:08 +08:00
Yuwei XIAO
61030d8e7a [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480)
1. basic write path(insert/upsert) implementation
 2. adapt simple bucket index
2022-05-16 11:07:01 +08:00
陈浩
1fded18dff fix hive sync no partition table error (#5585) 2022-05-16 09:51:24 +08:00
董可伦
75f847691f [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (#5495) 2022-05-16 09:50:29 +08:00
xi chaomin
6e16e719cd [HUDI-3980] Suport kerberos hbase index (#5464)
- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.

Co-authored-by: xicm <xicm@asiainfo.com>
2022-05-14 07:37:31 -04:00
wqwl611
52e63b39d6 [HUDI-4097] add table info to jobStatus (#5529)
Co-authored-by: wqwl611 <wqwl611@gmail.com>
2022-05-13 21:01:15 -04:00
Sivabalan Narayanan
5c4813f101 [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543) 2022-05-13 17:56:47 +05:30
Bo Cui
a704e3740c [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
2022-05-13 19:52:55 +08:00
Bo Cui
7fb436d3cf [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (#5545)
* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files
2022-05-13 14:32:48 +08:00
Xingcan Cui
8ad0bb9745 [MINOR] Fix a NPE for Option (#5461) 2022-05-13 12:20:40 +08:00
Bo Cui
701f8c039d [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
2022-05-13 09:50:11 +08:00
Sivabalan Narayanan
0cec955fa2 [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (#5501)
- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.
2022-05-12 21:01:55 -04:00
YueZhang
ecd47e7aae [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (#5562)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-05-12 07:26:00 -04:00
Sivabalan Narayanan
b10ca7e69f [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (#5559) 2022-05-11 16:02:54 -04:00
Jin Xing
7f0c1f3ddf [HUDI-4079] Supports showing table comment for hudi with spark3 (#5546) 2022-05-11 22:28:58 +08:00
Alexey Kudinkin
4a8589f222 [HUDI-4038] Avoid calling getDataSize after every record written (#5497)
- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.

Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-05-11 08:08:31 -04:00
4258a71517 [HUDI-4003] Try to read all the log file to parse schema (#5473) 2022-05-10 18:45:53 -04:00
aliceyyan
6fd21d0f10 [HUDI-4044] When reading data from flink-hudi to external storage, the … (#5516)
Co-authored-by: aliceyyan <aliceyyan@tencent.com>
2022-05-10 10:25:13 +08:00
Sivabalan Narayanan
6285a239a3 [HUDI-3995] Making perf optimizations for bulk insert row writer path (#5462)
- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.
2022-05-09 12:40:22 -04:00
xicm
6b47ef6ed2 [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (#5526)
* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized

Co-authored-by: xicm <xicm@asiainfo.com>
2022-05-09 16:35:50 +08:00
ForwardXu
4c70840275 [MINOR] Fixing close for HoodieCatalog's test (#5531)
* [MINOR] Fixing close for HoodieCatalog's test
2022-05-09 15:17:24 +08:00
guanziyue
75eaa0bffe [HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530) 2022-05-09 10:27:37 +08:00
Sivabalan Narayanan
569a76a9a5 [MINOR] fixing flaky tests in deltastreamer tests (#5521) 2022-05-07 15:37:20 -04:00
BruceLin
80f99893a0 [MINOR] Fixing class not found when using flink and enable metadata table (#5527) 2022-05-07 20:03:18 +08:00
cxzl25
9625d16937 [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287) 2022-05-07 15:39:14 +08:00
Sivabalan Narayanan
52fe1c9fae [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (#5073)
- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.
2022-05-06 09:27:29 -04:00
Raymond Xu
c319ee9cea [HUDI-4017] Improve spark sql coverage in CI (#5512)
Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.
2022-05-06 05:52:06 -07:00
Jin Xing
248b0591b0 [HUDI-4042] Support truncate-partition for Spark-3.2 (#5506) 2022-05-06 00:29:47 -07:00
guanziyue
abb4893b25 [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264) 2022-05-05 13:49:34 -07:00
qianchutao
d794f4fbf9 [MINOR] Optimize code logic (#5499) 2022-05-05 09:33:06 -07:00
Y Ethan Guo
f66e83dc65 [HUDI-3667] Run unit tests of hudi-integ-tests in CI (#5078) 2022-05-04 23:39:18 -07:00
Sagar Sumit
1562bb658f [HUDI-4031] Avoid clustering update handling when no pending replacecommit (#5487) 2022-05-04 10:17:11 -04:00
Raymond Xu
8c9209db28 [HUDI-4005] Update release scripts to help validation (#5479) 2022-05-04 10:15:54 -04:00
Sagar Sumit
3343cbb47b [MINOR] Update RFC status (#5486) 2022-05-03 08:57:18 -07:00
Todd Gao
9732ba12da [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)
* Add RFC doc

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

* Add note regarding catalog naming

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-05-02 22:05:23 +05:30
Raymond Xu
6af1ff7a66 [MINOR] Update DOAP for release 0.11.0 (#5467) 2022-04-30 10:51:16 -07:00
Wangyh
33ff4752ba [HUDI-3978] Fix use of partition path field as hive partition field in flink (#5434)
* Fix partition path fields as hive sync partition fields error
2022-04-29 20:58:54 -07:00
xicm
f492c52ee4 [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)
Co-authored-by: xicm <xicm@asiainfo.com>
2022-04-29 16:21:52 -07:00
Y Ethan Guo
a1d82b4dc5 [MINOR] Fix CI by ignoring SparkContext error (#5468)
Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers
2022-04-29 11:19:07 -07:00
吴祥平
e421d536ea [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (#5185)
* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView
2022-04-29 14:10:20 +08:00
Gary Li
b27e8b51d8 [MINOR] support different cleaning policy for flink (#5459) 2022-04-29 09:48:44 +08:00
LiChuang
4e928a6fe1 [HUDI-3943] Some description fixes for 0.10.1 docs (#5447) 2022-04-28 15:18:56 -07:00
Ibson
52953c8f5e [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (#5368)
Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>
2022-04-27 16:09:44 -07:00
watermelon12138
cacbd98687 [HUDI-3945] After the async compaction operation is complete, the task should exit. (#5391)
Co-authored-by: y00617041 <yangxuan42@huawei.com>
2022-04-27 21:16:09 +08:00