1
0
Commit Graph

2789 Commits

Author SHA1 Message Date
Raymond Xu
4e1ac467da [MINOR] Increase azure CI timeout to 120m (#5384) 2022-04-21 04:35:44 -07:00
Alexey Kudinkin
4b296f79cc [HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377) 2022-04-21 01:36:19 -07:00
Sivabalan Narayanan
a9506aa545 [HUDI-3938] Fix default value for num retries to acquire lock (#5380) 2022-04-21 01:08:43 -07:00
Alexey Kudinkin
f7544e23ac [HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns (#5364)
- Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row
 - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not
 - Fallback to append partition values in cases when the source columns are not persisted in data-file
 - Fixing HoodieBaseRelation incorrectly handling mandatory columns
2022-04-20 19:30:27 +08:00
吴祥平
408663c42b [HUDI-3912] Fix lose data when rollback in flink async compact (#5357)
* stop add event when has failed compact event

Co-authored-by: wxp <wxp4532@outlook.com>
2022-04-20 19:23:39 +08:00
Zhaojing Yu
6a3ce928b1 [HUDI-3904] Claim RFC number for Improve timeline server (#5354) 2022-04-19 23:31:21 -07:00
Danny Chan
7a9e411e9d [HUDI-3917] Flink write task hangs if last checkpoint has no data input (#5360) 2022-04-20 12:48:24 +08:00
Y Ethan Guo
28fdddfee0 [HUDI-3920] Fix partition path construction in metadata table validator (#5365) 2022-04-19 19:40:09 -04:00
Y Ethan Guo
6f3fe880d2 [HUDI-3905] Add S3 related setup in Kafka Connect quick start (#5356) 2022-04-19 15:08:28 -07:00
Alexey Kudinkin
81bf771e56 [HUDI-3902] Fallback to HadoopFsRelation in cases non-involving Schema Evolution (#5352)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-19 10:40:20 -07:00
Raymond Xu
9af7b09aec [HUDI-3894] Fix gcp bundle to include HBase dependencies and shading (#5349) 2022-04-18 21:47:10 -07:00
Sagar Sumit
4f44e6aeb5 [HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-04-18 22:28:46 -04:00
Y Ethan Guo
52d878c52b [HUDI-3903] Fix NoClassDefFoundError with Kafka Connect bundle (#5353) 2022-04-18 21:17:53 -04:00
Y Ethan Guo
ef6c5611dc [HUDI-3894] Fix datahub to include HBase dependencies and shading (#5338)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-18 16:20:50 -07:00
Alexey Kudinkin
7ecb47cd21 [HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we bucket the files efficiently (#5337) 2022-04-18 16:06:52 -04:00
Sagar Sumit
1718bcab84 [HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347) 2022-04-18 13:34:04 -04:00
Sivabalan Narayanan
b00d03fd62 [HUDI-3886] Adding default null for some of the fields in col stats in MDT schema (#5329) 2022-04-18 10:37:03 -04:00
Sivabalan Narayanan
05dfc39c29 Fixing async clustering job test in TestHoodieDeltaStreamer (#5317) 2022-04-18 17:38:33 +05:30
董可伦
b8e465fdfc [MINOR] Fix typos in log4j-surefire.properties (#5212) 2022-04-15 13:33:37 -07:00
董可伦
99dd1cb6e6 [HUDI-3835] Add UT for delete in java client (#5270) 2022-04-15 15:03:48 -04:00
Sivabalan Narayanan
e8ab915aff [MINOR] Removing invalid code to close parquet reader iterator (#5182) 2022-04-15 14:50:07 -04:00
Sivabalan Narayanan
57612c5c32 [HUDI-3848] Fixing restore with cleaned up commits (#5288) 2022-04-15 14:47:53 -04:00
Raymond Xu
9e8664f4d2 [HOTFIX] add missing license (#5322) (#5324) 2022-04-14 12:35:20 -07:00
Raymond Xu
d6a64f765e Revert "[HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce memory footprint (#5060)" (#5323)
This reverts commit f0ab4a6e9e.
2022-04-14 12:28:27 -07:00
sekaiga
f0ab4a6e9e [HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce memory footprint (#5060)
Co-authored-by: zhouhuidong <zhouhuidong@bilibili.co>
2022-04-14 03:08:14 -07:00
ForwardXu
6621f3cdbb [HUDI-3845] Fix delete mor table's partition with urlencode's error (#5282) 2022-04-14 01:49:00 -07:00
ForwardXu
44b3630b5d [HUDI-3826] Make truncate partition use delete_partition operation (#5272)
Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore
2022-04-14 00:53:05 -07:00
Sivabalan Narayanan
a081c2b9b5 [HUDI-3876] Fixing fetching partitions in GlueSyncClient (#5318) 2022-04-13 21:03:05 -07:00
Y Ethan Guo
571cbe4c11 [MINOR] Code cleanup in test utils (#5312) 2022-04-13 17:37:07 -04:00
Y Ethan Guo
bab691692e [HUDI-3686] Fix inline and async table service check in HoodieWriteConfig (#5307) 2022-04-13 17:33:26 -04:00
Y Ethan Guo
c7f41f9018 [HUDI-3869] Improve error handling of loading Hudi conf (#5311) 2022-04-13 17:25:31 -04:00
Danny Chan
6f9b02decb [HUDI-3870] Add timeout rollback for flink online compaction (#5314) 2022-04-13 20:05:48 +08:00
Danny Chan
0281725c6b [MINOR] Inline the partition path logic into the builder (#5310) 2022-04-13 16:54:39 +05:30
Danny Chan
43de2b4702 [HUDI-3868] Disable the sort input for flink streaming append mode (#5309) 2022-04-13 14:21:08 +08:00
Alexey Kudinkin
434e782b7d [HUDI-3867] Disable Data Skipping by default (#5306) 2022-04-13 11:21:12 +05:30
Alexey Kudinkin
7b78dff45f [HUDI-3855] Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle (#5296)
Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle, in cases when old-record is carried over from existing file as is.

- Revisited HoodieFileWriter API to accept HoodieKey instead of HoodieRecord
- Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over
- Exposing standard JVM's debugger ports in Docker setup
2022-04-12 20:42:15 -04:00
Raymond Xu
2e6e302efe [HUDI-3859] Fix spark profiles and utilities-slim dep (#5297) 2022-04-12 15:33:08 -07:00
Vinoth Govindarajan
2d46d5287e [HUDI-3838] Moved the getPartitionColumns logic to driver. (#5303) 2022-04-12 18:03:00 -04:00
satishm
25dce94ba2 [MINOR] Integ Test Reducing partitions for log running multi partition yaml (#5300) 2022-04-12 12:15:17 -04:00
Raymond Xu
84783b9779 [HUDI-3843] Make flink profiles build with scala-2.11 (#5279) 2022-04-12 08:33:48 -07:00
Vinoth Govindarajan
d16740976e [HUDI-3838] Implemented drop partition column feature for delta streamer code path (#5294)
* [HUDI-3838] Implemented drop partition column feature for delta streamer code path

* Ensure drop partition table config is updated in hoodie.props

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-04-12 18:10:30 +05:30
Alexey Kudinkin
101b82a679 [HUDI-3839] Fixing incorrect selection of MT partitions to be updated (#5274)
* Fixing incorrect selection of MT partitions to be updated

* Ensure that metadata partitions table config is inherited correctly

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-04-12 13:37:52 +05:30
Sivabalan Narayanan
f91e9e63e1 [HUDI-3799] Fixing not deleting empty instants w/o archiving (#5261) 2022-04-11 21:02:43 -07:00
Sagar Sumit
3d8fc78c66 [HUDI-3844] Update props in indexer based on table config (#5293) 2022-04-11 18:16:06 -04:00
Alexey Kudinkin
458fdd5611 [HUDI-3841] Fixing Column Stats in the presence of Schema Evolution (#5275)
Currently, Data Skipping is not handling correctly the case when column-stats are not aligned and, for ex, some of the (column, file) combinations are missing from the CSI.

This could occur in different scenarios (schema evolution, CSI config changes), and has to be handled properly when we're composing CSI projection for Data Skipping. This PR addresses that.

- Added appropriate aligning for the transposed CSI projection
2022-04-11 15:45:53 -04:00
Sivabalan Narayanan
52ea1e4964 [MINOR] fixing timeline server for integ tests (#5289) 2022-04-11 10:14:51 -04:00
RexXiong
5c41e30ac5 [HUDI-3817] shade parquet dependency for hudi-hadoop-mr-bundle (#5250)
Co-authored-by: lvshuang.xjs <lvshuang.xjs@alibaba-inc.com>
2022-04-11 05:44:46 -07:00
Sivabalan Narayanan
2245a9515f [HUDI-3798] Fixing ending of a transaction by different owner and removing some extraneous methods in trxn manager (#5255) 2022-04-11 10:16:07 +05:30
Y Ethan Guo
63a099c5b7 [HUDI-3847] Fix NPE due to null schema in HoodieMetadataTableValidator (#5284) 2022-04-10 17:59:29 -07:00
Sivabalan Narayanan
12731f5b89 [HUDI-3842] Integ tests for non partitioned datasets (#5276)
- Adding non-partitioned support to integ tests
- Fixing some of the test yamls and properties
2022-04-10 20:09:48 -04:00