Bo Cui
3fdc9332e5
[HUDI-3516] Implement record iterator for HoodieDataBlock ( #4909 )
...
* Use iterator to void eager materialization to be memory friendly
2022-03-02 10:19:36 +08:00
ForwardXu
a81a6326d5
[HUDI-3441] Add support for "marker delete" in hudi-cli ( #4922 )
2022-03-01 16:03:53 +08:00
Sivabalan Narayanan
f7088a957c
[HUDI-3497] Adding Datatable validator tool ( #4902 )
2022-02-28 22:46:32 -05:00
Y Ethan Guo
257052a94d
[HUDI-3465] Add validation of column stats and bloom filters in HoodieMetadataTableValidator ( #4878 )
2022-02-28 18:49:30 -08:00
yuzhaojing
44b8ab6048
[HUDI-3418] Save timeout option for remote RemoteFileSystemView ( #4809 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-02-28 15:16:40 -05:00
wenningd
18dc89cf79
[HUDI-3450] Avoid passing empty string spark master to hudi cli ( #4844 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-02-28 11:37:24 -05:00
Y Ethan Guo
05e395ae5f
[HUDI-3341] Fix log file reader for S3 with hadoop-aws 2.7.x ( #4897 )
2022-02-28 11:14:35 -05:00
stayrascal
8f1e4f5b3e
[HUDI-3528] Fix String convert issue and overwrite putAll method in TypedProperties.java ( #4920 )
2022-02-28 10:45:47 -05:00
Sivabalan Narayanan
4a59876c8b
[HUDI-2917] rollback insert data appended to log file when using Hbase Index ( #4840 )
...
Co-authored-by: guanziyue <guanziyue@gmail.com >
2022-02-28 08:13:17 -05:00
Bo Cui
193215201c
[MINOR] Change MINI_BATCH_SIZE to 2048 ( #4862 )
...
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Sivabalan Narayanan
d5444ff7ff
[HUDI-3018] Adding validation to dataframe scheme to ensure reserved field does not have diff data type ( #4852 )
2022-02-27 11:59:23 -05:00
Sivabalan Narayanan
2f99e8458a
[HUDI-3521] Fixing kakfa key and value serializer value type from class to string ( #4919 )
2022-02-27 11:13:13 -05:00
Raymond Xu
c77b2591d0
[HUDI-2439] Remove SparkBoundedInMemoryExecutor ( #4860 )
2022-02-26 08:02:12 -05:00
Sivabalan Narayanan
1379300b5b
[HUDI-3483] Adding insert override nodes to integ test suite and few clean ups ( #4895 )
2022-02-26 08:00:15 -05:00
Sagar Sumit
6a5cfb45b9
[MINOR] Fix table type in input format test ( #4912 )
2022-02-25 13:51:53 -05:00
苏承祥
92cdc5987a
[HUDI-3515] Making rdd unpersist optional at the end of writes ( #4898 )
...
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-02-25 11:30:10 -05:00
Raymond Xu
b50f4b491c
[HUDI-3042] Refactor clustering executors ( #4847 )
2022-02-25 05:39:43 -08:00
YueZhang
742810070b
[HUDI-3421]Pending clustering may break AbstractTableFileSystemView#getxxBaseFile() ( #4810 )
2022-02-25 16:46:27 +05:30
Danny Chan
a4ee7463ae
[HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline ( #4906 )
2022-02-25 19:08:51 +08:00
todd5167
45d1216e91
[HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName validation ( #4774 )
2022-02-24 23:31:29 -05:00
YueZhang
3694485609
[HUDI-3429] Support clustering scheduleAndExecute for hudi-cli and add clustering-cli Tests ( #4817 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-24 23:28:38 -05:00
ForwardXu
aa1810d737
[HUDI-3493] Not table to get execution plan ( #4894 )
2022-02-24 17:04:44 -08:00
Alexey Kudinkin
85e8a5c4de
[HUDI-1296] Support Metadata Table in Spark Datasource ( #4789 )
...
* Bootstrapping initial support for Metadata Table in Spark Datasource
- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
2022-02-24 16:23:13 -05:00
ForwardXu
521338b4d9
[HUDI-3161] Add Call Produce Command for Spark SQL ( #4535 )
2022-02-24 07:45:37 -08:00
yanenze
943b99775b
[HUDI-3488] The flink small file list should exclude file slices with pending compaction ( #4893 )
...
# this happens when the async-compaction has been configured
Co-authored-by: yanenze <yanenze@keytop.com.cn >
2022-02-24 14:45:03 +08:00
Sivabalan Narayanan
62605be413
[HUDI-3480][HUDI-3481] Enchancements to integ test suite ( #4884 )
2022-02-23 15:56:35 -05:00
leesf
2a93b8efb2
[HUDI-3489] Unify config to avoid duplicate code ( #4883 )
2022-02-23 08:14:30 -05:00
Y Ethan Guo
4e8accc179
[HUDI-3486] Fix wrong field order for constructing HoodieMetadataColumnStats ( #4875 )
2022-02-23 10:27:02 +05:30
yuzhaojing
dabae80423
[HUDI-3420] Remove duplicates type in HoodieClusteringGroup.avsc ( #4808 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-02-23 10:49:47 +08:00
从大数据到人工智能
01cbddef78
Add hive-standalone-metastore dependency to hudi-flink-bundle module ( #4870 )
2022-02-23 09:16:21 +08:00
Sivabalan Narayanan
9678c3fbcf
[MINOR] Fixing checkpoint management in S3IncrSource ( #4871 )
2022-02-22 09:15:16 -05:00
Danny Chan
b87e95d621
[HUDI-3476] Remove the shade pattern for parquet for flink bundle jar ( #4869 )
2022-02-22 19:21:57 +08:00
Danny Chan
4affdd0c8f
[HUDI-3461] The archived timeline for flink streaming reader should not be reused ( #4861 )
...
* Before the patch, the flink streaming reader caches the meta client thus the archived timeline,
when fetching the instant details from the reused timeline, the exception throws
* Add a method in HoodieTableMetaClient to return a fresh new archived timeline each time
2022-02-22 15:54:29 +08:00
wangxianghu
4d1f74ebea
[HUDI-3464] Fix wrong exception thrown from HiveSchemaProvider ( #4865 )
2022-02-22 10:20:20 +04:00
Sivabalan Narayanan
14dbbdf4c7
[HUDI-2189] Adding delete partitions support to DeltaStreamer ( #4787 )
2022-02-22 00:01:30 -05:00
Y Ethan Guo
7e1ea06eb9
[MINOR] Fix typos and improve docs in HoodieMetadataConfig ( #4867 )
2022-02-21 19:36:20 -08:00
Prashant Wason
0dee8edc97
[HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present using a config. ( #4212 )
...
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-02-21 21:53:03 -05:00
Yann Byron
0c950181aa
[HUDI-3423] upgrade spark to 3.2.1 ( #4815 )
2022-02-21 16:52:21 -08:00
RexAn
801fdab55c
[HUDI-3042] Abstract Spark update Strategy to make code more clean and remove duplicates ( #4845 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-02-21 06:53:09 -08:00
Pratyaksh Sharma
bf16bc122a
[HUDI-349]: Added new cleaning policy based on number of hours ( #3646 )
2022-02-21 09:04:42 -05:00
Sivabalan Narayanan
d36fe24c9e
[HUDI-3455] Fixing checkpoint management in hoodie incr source ( #4850 )
2022-02-21 08:19:57 -05:00
Sivabalan Narayanan
17cb5cb433
[HUDI-3432] Fixing restore with metadata enabled ( #4849 )
...
* Fixing restore with metadata enabled
* Fixing test failures
2022-02-21 18:25:30 +05:30
leesf
76b6ad6491
[HUDI-2732][RFC-38] Spark Datasource V2 Integration ( #3964 )
2022-02-21 20:14:07 +08:00
YueZhang
359fbfde79
[HUDI-2648] Retry FileSystem action instead of failed directly. ( #3887 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-20 15:31:31 -05:00
Raymond Xu
0938f55a2b
[HUDI-3458] Fix BulkInsertPartitioner generic type ( #4854 )
2022-02-20 13:51:58 -05:00
Sivabalan Narayanan
66ac1446dd
[MINOR] Moving spark scheduling configs out of DataSourceOptions ( #4843 )
2022-02-20 13:49:18 -05:00
Bo Cui
83279971a1
[HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords ( #4837 )
...
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd
[HUDI-3389] fix ColumnarArrayData ClassCastException issue ( #4842 )
...
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue
* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04
[HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 ( #4823 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-02-18 08:57:04 -05:00
Y Ethan Guo
fba5822ee3
[HUDI-3430] Fix Deltastreamer to properly shut down the services upon failure ( #4824 )
2022-02-18 08:44:56 -05:00