1
0
Commit Graph

2522 Commits

Author SHA1 Message Date
Y Ethan Guo
257052a94d [HUDI-3465] Add validation of column stats and bloom filters in HoodieMetadataTableValidator (#4878) 2022-02-28 18:49:30 -08:00
yuzhaojing
44b8ab6048 [HUDI-3418] Save timeout option for remote RemoteFileSystemView (#4809)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-28 15:16:40 -05:00
wenningd
18dc89cf79 [HUDI-3450] Avoid passing empty string spark master to hudi cli (#4844)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-02-28 11:37:24 -05:00
Y Ethan Guo
05e395ae5f [HUDI-3341] Fix log file reader for S3 with hadoop-aws 2.7.x (#4897) 2022-02-28 11:14:35 -05:00
stayrascal
8f1e4f5b3e [HUDI-3528] Fix String convert issue and overwrite putAll method in TypedProperties.java (#4920) 2022-02-28 10:45:47 -05:00
Sivabalan Narayanan
4a59876c8b [HUDI-2917] rollback insert data appended to log file when using Hbase Index (#4840)
Co-authored-by: guanziyue <guanziyue@gmail.com>
2022-02-28 08:13:17 -05:00
Bo Cui
193215201c [MINOR] Change MINI_BATCH_SIZE to 2048 (#4862)
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Sivabalan Narayanan
d5444ff7ff [HUDI-3018] Adding validation to dataframe scheme to ensure reserved field does not have diff data type (#4852) 2022-02-27 11:59:23 -05:00
Sivabalan Narayanan
2f99e8458a [HUDI-3521] Fixing kakfa key and value serializer value type from class to string (#4919) 2022-02-27 11:13:13 -05:00
Raymond Xu
c77b2591d0 [HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860) 2022-02-26 08:02:12 -05:00
Sivabalan Narayanan
1379300b5b [HUDI-3483] Adding insert override nodes to integ test suite and few clean ups (#4895) 2022-02-26 08:00:15 -05:00
Sagar Sumit
6a5cfb45b9 [MINOR] Fix table type in input format test (#4912) 2022-02-25 13:51:53 -05:00
苏承祥
92cdc5987a [HUDI-3515] Making rdd unpersist optional at the end of writes (#4898)
Co-authored-by: 苏承祥 <sucx@tuya.com>
2022-02-25 11:30:10 -05:00
Raymond Xu
b50f4b491c [HUDI-3042] Refactor clustering executors (#4847) 2022-02-25 05:39:43 -08:00
YueZhang
742810070b [HUDI-3421]Pending clustering may break AbstractTableFileSystemView#getxxBaseFile() (#4810) 2022-02-25 16:46:27 +05:30
Danny Chan
a4ee7463ae [HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline (#4906) 2022-02-25 19:08:51 +08:00
todd5167
45d1216e91 [HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName validation (#4774) 2022-02-24 23:31:29 -05:00
YueZhang
3694485609 [HUDI-3429] Support clustering scheduleAndExecute for hudi-cli and add clustering-cli Tests (#4817)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-24 23:28:38 -05:00
ForwardXu
aa1810d737 [HUDI-3493] Not table to get execution plan (#4894) 2022-02-24 17:04:44 -08:00
Alexey Kudinkin
85e8a5c4de [HUDI-1296] Support Metadata Table in Spark Datasource (#4789)
* Bootstrapping initial support for Metadata Table in Spark Datasource

- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
2022-02-24 16:23:13 -05:00
ForwardXu
521338b4d9 [HUDI-3161] Add Call Produce Command for Spark SQL (#4535) 2022-02-24 07:45:37 -08:00
yanenze
943b99775b [HUDI-3488] The flink small file list should exclude file slices with pending compaction (#4893)
# this happens when the async-compaction has been configured

Co-authored-by: yanenze <yanenze@keytop.com.cn>
2022-02-24 14:45:03 +08:00
Sivabalan Narayanan
62605be413 [HUDI-3480][HUDI-3481] Enchancements to integ test suite (#4884) 2022-02-23 15:56:35 -05:00
leesf
2a93b8efb2 [HUDI-3489] Unify config to avoid duplicate code (#4883) 2022-02-23 08:14:30 -05:00
Y Ethan Guo
4e8accc179 [HUDI-3486] Fix wrong field order for constructing HoodieMetadataColumnStats (#4875) 2022-02-23 10:27:02 +05:30
yuzhaojing
dabae80423 [HUDI-3420] Remove duplicates type in HoodieClusteringGroup.avsc (#4808)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-23 10:49:47 +08:00
从大数据到人工智能
01cbddef78 Add hive-standalone-metastore dependency to hudi-flink-bundle module (#4870) 2022-02-23 09:16:21 +08:00
Sivabalan Narayanan
9678c3fbcf [MINOR] Fixing checkpoint management in S3IncrSource (#4871) 2022-02-22 09:15:16 -05:00
Danny Chan
b87e95d621 [HUDI-3476] Remove the shade pattern for parquet for flink bundle jar (#4869) 2022-02-22 19:21:57 +08:00
Danny Chan
4affdd0c8f [HUDI-3461] The archived timeline for flink streaming reader should not be reused (#4861)
* Before the patch, the flink streaming reader caches the meta client thus the archived timeline,
  when fetching the instant details from the reused timeline, the exception throws
* Add a method in HoodieTableMetaClient to return a fresh new archived timeline each time
2022-02-22 15:54:29 +08:00
wangxianghu
4d1f74ebea [HUDI-3464] Fix wrong exception thrown from HiveSchemaProvider (#4865) 2022-02-22 10:20:20 +04:00
Sivabalan Narayanan
14dbbdf4c7 [HUDI-2189] Adding delete partitions support to DeltaStreamer (#4787) 2022-02-22 00:01:30 -05:00
Y Ethan Guo
7e1ea06eb9 [MINOR] Fix typos and improve docs in HoodieMetadataConfig (#4867) 2022-02-21 19:36:20 -08:00
Prashant Wason
0dee8edc97 [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present using a config. (#4212)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-02-21 21:53:03 -05:00
Yann Byron
0c950181aa [HUDI-3423] upgrade spark to 3.2.1 (#4815) 2022-02-21 16:52:21 -08:00
RexAn
801fdab55c [HUDI-3042] Abstract Spark update Strategy to make code more clean and remove duplicates (#4845)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-21 06:53:09 -08:00
Pratyaksh Sharma
bf16bc122a [HUDI-349]: Added new cleaning policy based on number of hours (#3646) 2022-02-21 09:04:42 -05:00
Sivabalan Narayanan
d36fe24c9e [HUDI-3455] Fixing checkpoint management in hoodie incr source (#4850) 2022-02-21 08:19:57 -05:00
Sivabalan Narayanan
17cb5cb433 [HUDI-3432] Fixing restore with metadata enabled (#4849)
* Fixing restore with metadata enabled

* Fixing test failures
2022-02-21 18:25:30 +05:30
leesf
76b6ad6491 [HUDI-2732][RFC-38] Spark Datasource V2 Integration (#3964) 2022-02-21 20:14:07 +08:00
YueZhang
359fbfde79 [HUDI-2648] Retry FileSystem action instead of failed directly. (#3887)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-20 15:31:31 -05:00
Raymond Xu
0938f55a2b [HUDI-3458] Fix BulkInsertPartitioner generic type (#4854) 2022-02-20 13:51:58 -05:00
Sivabalan Narayanan
66ac1446dd [MINOR] Moving spark scheduling configs out of DataSourceOptions (#4843) 2022-02-20 13:49:18 -05:00
Bo Cui
83279971a1 [HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords (#4837)
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd [HUDI-3389] fix ColumnarArrayData ClassCastException issue (#4842)
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue

* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04 [HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 (#4823)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-02-18 08:57:04 -05:00
Y Ethan Guo
fba5822ee3 [HUDI-3430] Fix Deltastreamer to properly shut down the services upon failure (#4824) 2022-02-18 08:44:56 -05:00
luokey
de8161ae96 HoodieSortedMergeHandle#close write data disorder (#4841)
Co-authored-by: 854194341@qq.com <loukey_7821>
2022-02-18 13:31:38 +04:00
Sagar Sumit
ed106f671e [HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties (#4712)
Fix dependency conflict

Fix repairs command

Implement putIfAbsent for DDB lock provider

Add upgrade step and validate while fetching configs

Validate checksum for latest table version only while fetching config

Move generateChecksum to BinaryUtil

Rebase and resolve conflict

Fix table version check
2022-02-18 10:17:06 +05:30
Danny Chan
2844a77b43 [HUDI-3439] Remove the hive shade pattern for flink bundle jar (#4833) 2022-02-17 22:42:39 +08:00