1
0
Commit Graph

2818 Commits

Author SHA1 Message Date
Sagar Sumit
3343cbb47b [MINOR] Update RFC status (#5486) 2022-05-03 08:57:18 -07:00
Todd Gao
9732ba12da [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)
* Add RFC doc

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

* Add note regarding catalog naming

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-05-02 22:05:23 +05:30
Raymond Xu
6af1ff7a66 [MINOR] Update DOAP for release 0.11.0 (#5467) 2022-04-30 10:51:16 -07:00
Wangyh
33ff4752ba [HUDI-3978] Fix use of partition path field as hive partition field in flink (#5434)
* Fix partition path fields as hive sync partition fields error
2022-04-29 20:58:54 -07:00
xicm
f492c52ee4 [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)
Co-authored-by: xicm <xicm@asiainfo.com>
2022-04-29 16:21:52 -07:00
Y Ethan Guo
a1d82b4dc5 [MINOR] Fix CI by ignoring SparkContext error (#5468)
Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers
2022-04-29 11:19:07 -07:00
吴祥平
e421d536ea [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (#5185)
* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView
2022-04-29 14:10:20 +08:00
Gary Li
b27e8b51d8 [MINOR] support different cleaning policy for flink (#5459) 2022-04-29 09:48:44 +08:00
LiChuang
4e928a6fe1 [HUDI-3943] Some description fixes for 0.10.1 docs (#5447) 2022-04-28 15:18:56 -07:00
Ibson
52953c8f5e [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (#5368)
Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>
2022-04-27 16:09:44 -07:00
watermelon12138
cacbd98687 [HUDI-3945] After the async compaction operation is complete, the task should exit. (#5391)
Co-authored-by: y00617041 <yangxuan42@huawei.com>
2022-04-27 21:16:09 +08:00
huberylee
924e2e96a6 Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (#5441) 2022-04-27 14:07:29 +08:00
Danny Chan
e1ccf2e00b [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (#5432) 2022-04-27 13:19:55 +08:00
KnightChess
6ec039ba42 [MINOR] Update alter rename command class type for pattern matching (#5381) 2022-04-26 19:39:51 -07:00
Yann Byron
77e333298d [HUDI-3478] Claim RFC 51 For CDC (#5437) 2022-04-26 20:56:47 +05:30
Sivabalan Narayanan
762623a15c [HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (#5424)
Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.
2022-04-25 23:03:10 -04:00
Yuwei XIAO
f2ba0fead2 [HUDI-3085] Improve bulk insert partitioner abstraction (#4441) 2022-04-25 18:42:17 +08:00
ForwardXu
9054b85961 Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for flink-hudi (#5405)" (#5421)
This reverts commit bda3db078e.
2022-04-25 12:58:27 +08:00
Ruguo Yu
d994c58cc0 [HUDI-3946] Validate option path in flink hudi sink (#5397) 2022-04-25 10:13:47 +08:00
hehuiyuan
bda3db078e support generan parameter 'sink.parallelism' for flink-hudi (#5405)
Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com>
2022-04-24 19:09:39 +08:00
miomiocat
5e5c177e4b [HUDI-3923] Fix cast exception while reading boolean type of partitioned field (#5373) 2022-04-23 20:12:54 +08:00
Y Ethan Guo
8633bd6e06 [HUDI-3948] Fix presto bundle missing HBase classes (#5398) 2022-04-23 01:33:55 -07:00
Raymond Xu
505ee672ac [HUDI-3950] add parquet-avro to gcp-bundle (#5399) 2022-04-23 11:59:49 +08:00
Sivabalan Narayanan
7523542c1d [HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401) 2022-04-22 22:20:05 -04:00
Y Ethan Guo
20781a5fa6 [DOCS] Add commit activity, twitter badgers, and Hudi logo in README (#5336) 2022-04-22 16:51:07 +08:00
Alexey Kudinkin
c05a4e7b6f [HUDI-3934] Fix Spark32HoodieParquetFileFormat not being compatible w/ Spark 3.2.0 (#5378)
- Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that.

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-21 21:00:38 -04:00
Y Ethan Guo
c4bc2deea0 [HUDI-3936] Fix projection for a nested field as pre-combined key (#5379)
This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.

- Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
2022-04-21 20:17:57 -04:00
xiarixiaoyao
037f89ee7c [HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376)
- when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.
2022-04-21 18:27:54 -04:00
Sagar Sumit
de5fa1fe03 [HUDI-3940] Fix retry count increment in lock manager (#5387) 2022-04-21 16:52:05 -04:00
Raymond Xu
4e1ac467da [MINOR] Increase azure CI timeout to 120m (#5384) 2022-04-21 04:35:44 -07:00
Alexey Kudinkin
4b296f79cc [HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377) 2022-04-21 01:36:19 -07:00
Sivabalan Narayanan
a9506aa545 [HUDI-3938] Fix default value for num retries to acquire lock (#5380) 2022-04-21 01:08:43 -07:00
Alexey Kudinkin
f7544e23ac [HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns (#5364)
- Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row
 - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not
 - Fallback to append partition values in cases when the source columns are not persisted in data-file
 - Fixing HoodieBaseRelation incorrectly handling mandatory columns
2022-04-20 19:30:27 +08:00
吴祥平
408663c42b [HUDI-3912] Fix lose data when rollback in flink async compact (#5357)
* stop add event when has failed compact event

Co-authored-by: wxp <wxp4532@outlook.com>
2022-04-20 19:23:39 +08:00
Zhaojing Yu
6a3ce928b1 [HUDI-3904] Claim RFC number for Improve timeline server (#5354) 2022-04-19 23:31:21 -07:00
Danny Chan
7a9e411e9d [HUDI-3917] Flink write task hangs if last checkpoint has no data input (#5360) 2022-04-20 12:48:24 +08:00
Y Ethan Guo
28fdddfee0 [HUDI-3920] Fix partition path construction in metadata table validator (#5365) 2022-04-19 19:40:09 -04:00
Y Ethan Guo
6f3fe880d2 [HUDI-3905] Add S3 related setup in Kafka Connect quick start (#5356) 2022-04-19 15:08:28 -07:00
Alexey Kudinkin
81bf771e56 [HUDI-3902] Fallback to HadoopFsRelation in cases non-involving Schema Evolution (#5352)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-19 10:40:20 -07:00
Raymond Xu
9af7b09aec [HUDI-3894] Fix gcp bundle to include HBase dependencies and shading (#5349) 2022-04-18 21:47:10 -07:00
Sagar Sumit
4f44e6aeb5 [HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-04-18 22:28:46 -04:00
Y Ethan Guo
52d878c52b [HUDI-3903] Fix NoClassDefFoundError with Kafka Connect bundle (#5353) 2022-04-18 21:17:53 -04:00
Y Ethan Guo
ef6c5611dc [HUDI-3894] Fix datahub to include HBase dependencies and shading (#5338)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-18 16:20:50 -07:00
Alexey Kudinkin
7ecb47cd21 [HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we bucket the files efficiently (#5337) 2022-04-18 16:06:52 -04:00
Sagar Sumit
1718bcab84 [HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347) 2022-04-18 13:34:04 -04:00
Sivabalan Narayanan
b00d03fd62 [HUDI-3886] Adding default null for some of the fields in col stats in MDT schema (#5329) 2022-04-18 10:37:03 -04:00
Sivabalan Narayanan
05dfc39c29 Fixing async clustering job test in TestHoodieDeltaStreamer (#5317) 2022-04-18 17:38:33 +05:30
董可伦
b8e465fdfc [MINOR] Fix typos in log4j-surefire.properties (#5212) 2022-04-15 13:33:37 -07:00
董可伦
99dd1cb6e6 [HUDI-3835] Add UT for delete in java client (#5270) 2022-04-15 15:03:48 -04:00
Sivabalan Narayanan
e8ab915aff [MINOR] Removing invalid code to close parquet reader iterator (#5182) 2022-04-15 14:50:07 -04:00