1
0
Commit Graph

2546 Commits

Author SHA1 Message Date
wangxianghu
c9ffdc493e [HUDI-3525] Introduce JsonkafkaSourceProcessor to support data preprocess before it is transformed to DataSet (#4930) 2022-03-06 15:41:01 -05:00
wangxianghu
4b471772aa [HUDI-3520] Introduce DeleteSupportSchemaPostProcessor to support adding _hoodie_is_deleted column to schema (#4921) 2022-03-06 15:37:09 -05:00
Aditya Tiwari
051ad0b033 [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas (#4468)
* Fixing Hive getSchema for RT tables

* Addressing feedback

* temp diff

* fixing tests after spark datasource read support for metadata table is merged to master

* Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader

Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-03-06 07:51:35 +05:30
Sivabalan Narayanan
6a46130037 [HUDI-2761] Fixing timeline server for repeated refreshes (#4812)
* Fixing timeline server for repeated refreshes
2022-03-05 10:04:16 +08:00
Bo Cui
0986d5a01d [HUDI-3460] Add reader merge memory option for flink (#4911)
* flink TM memory Optimization
2022-03-04 19:29:29 +08:00
Raymond Xu
b4362fac45 [HUDI-3348] Add UT to verify HoodieRealtimeFileSplit serde (#4951) 2022-03-04 11:19:16 +04:00
Yuwei XIAO
f449807630 [MINOR] fix UTC timezone config (#4950) 2022-03-04 11:09:39 +04:00
ForwardXu
6faed3d90a [HUDI-3161][RFC-47] Add Call Produce Command for Spark SQL (#4607) 2022-03-03 20:02:46 -08:00
shibei
62f534d002 [HUDI-3445] Support Clustering Command Based on Call Procedure Command for Spark SQL (#4901)
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

Co-authored-by: shibei <huberylee.li@alibaba-inc.com>
2022-03-04 09:33:16 +08:00
RexAn
be9a264885 [HUDI-3548] Fix if user specify key "hoodie.datasource.clustering.async.enable" directly, async clustering not work (#4905)
Co-authored-by: Rex An <bonean131@gmail.com>
2022-03-03 19:14:07 -05:00
Danny Chan
a4ba0fff07 [HUDI-3552] Strength the NetworkUtils#getHostname by checking network interfaces first (#4942)
* In some complex network environment, the current code returns wildcard address 0.0.0.0 which is not desired.
2022-03-03 21:11:08 +08:00
Sivabalan Narayanan
876a891979 [HUDI-3544] Fixing "populate meta fields" update to metadata table (#4941)
* Fixing populateMeta fields update to metadata table

* Fix checkstyle violations

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-03-03 17:02:25 +05:30
Manoj Govindassamy
51ee5005a6 [HUDI-2973] RFC-27: Data skipping index to improve query performance (#4728)
- Updating the schema used for data skipping index
2022-03-03 15:56:22 +05:30
Pratyaksh Sharma
907e60c252 [HUDI-3264]: made schema registry urls configurable with MTDS (#4779) 2022-03-02 15:30:41 -05:00
liujinhui
527bd34b1c [MINOR] RFC-38 markdown content error (#4933)
* Minor content error

* Minor content error
2022-03-02 19:40:28 +04:00
Sivabalan Narayanan
f8945eca08 [MINOR] Adding more test props to integ tests (#4935) 2022-03-02 08:10:43 -05:00
Danny Chan
1d57bd17c2 [minor] Cosmetic changes following HUDI-3315 (#4934) 2022-03-02 17:44:52 +08:00
Gary Li
10d866f083 [HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer (#4679)
* Support bucket index in Flink writer
* Use record key as default index key
2022-03-02 15:14:44 +08:00
Alexey Kudinkin
85f47b53df [HUDI-3469] Refactor HoodieTestDataGenerator to provide for reproducible Builds (#4866) 2022-03-01 22:15:26 -08:00
yuzhaojing
3b2da9f138 [HUDI-2631] In CompactFunction, set up the write schema each time with the latest schema (#4000)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-03-02 11:18:17 +08:00
stayrascal
3cfb52c413 [MINOR] fix get builtin function issue from Hudi catalog (#4917) 2022-03-02 11:16:19 +08:00
Bo Cui
3fdc9332e5 [HUDI-3516] Implement record iterator for HoodieDataBlock (#4909)
*  Use iterator to void eager materialization to be memory friendly
2022-03-02 10:19:36 +08:00
ForwardXu
a81a6326d5 [HUDI-3441] Add support for "marker delete" in hudi-cli (#4922) 2022-03-01 16:03:53 +08:00
Sivabalan Narayanan
f7088a957c [HUDI-3497] Adding Datatable validator tool (#4902) 2022-02-28 22:46:32 -05:00
Y Ethan Guo
257052a94d [HUDI-3465] Add validation of column stats and bloom filters in HoodieMetadataTableValidator (#4878) 2022-02-28 18:49:30 -08:00
yuzhaojing
44b8ab6048 [HUDI-3418] Save timeout option for remote RemoteFileSystemView (#4809)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-28 15:16:40 -05:00
wenningd
18dc89cf79 [HUDI-3450] Avoid passing empty string spark master to hudi cli (#4844)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-02-28 11:37:24 -05:00
Y Ethan Guo
05e395ae5f [HUDI-3341] Fix log file reader for S3 with hadoop-aws 2.7.x (#4897) 2022-02-28 11:14:35 -05:00
stayrascal
8f1e4f5b3e [HUDI-3528] Fix String convert issue and overwrite putAll method in TypedProperties.java (#4920) 2022-02-28 10:45:47 -05:00
Sivabalan Narayanan
4a59876c8b [HUDI-2917] rollback insert data appended to log file when using Hbase Index (#4840)
Co-authored-by: guanziyue <guanziyue@gmail.com>
2022-02-28 08:13:17 -05:00
Bo Cui
193215201c [MINOR] Change MINI_BATCH_SIZE to 2048 (#4862)
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Sivabalan Narayanan
d5444ff7ff [HUDI-3018] Adding validation to dataframe scheme to ensure reserved field does not have diff data type (#4852) 2022-02-27 11:59:23 -05:00
Sivabalan Narayanan
2f99e8458a [HUDI-3521] Fixing kakfa key and value serializer value type from class to string (#4919) 2022-02-27 11:13:13 -05:00
Raymond Xu
c77b2591d0 [HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860) 2022-02-26 08:02:12 -05:00
Sivabalan Narayanan
1379300b5b [HUDI-3483] Adding insert override nodes to integ test suite and few clean ups (#4895) 2022-02-26 08:00:15 -05:00
Sagar Sumit
6a5cfb45b9 [MINOR] Fix table type in input format test (#4912) 2022-02-25 13:51:53 -05:00
苏承祥
92cdc5987a [HUDI-3515] Making rdd unpersist optional at the end of writes (#4898)
Co-authored-by: 苏承祥 <sucx@tuya.com>
2022-02-25 11:30:10 -05:00
Raymond Xu
b50f4b491c [HUDI-3042] Refactor clustering executors (#4847) 2022-02-25 05:39:43 -08:00
YueZhang
742810070b [HUDI-3421]Pending clustering may break AbstractTableFileSystemView#getxxBaseFile() (#4810) 2022-02-25 16:46:27 +05:30
Danny Chan
a4ee7463ae [HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline (#4906) 2022-02-25 19:08:51 +08:00
todd5167
45d1216e91 [HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName validation (#4774) 2022-02-24 23:31:29 -05:00
YueZhang
3694485609 [HUDI-3429] Support clustering scheduleAndExecute for hudi-cli and add clustering-cli Tests (#4817)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-24 23:28:38 -05:00
ForwardXu
aa1810d737 [HUDI-3493] Not table to get execution plan (#4894) 2022-02-24 17:04:44 -08:00
Alexey Kudinkin
85e8a5c4de [HUDI-1296] Support Metadata Table in Spark Datasource (#4789)
* Bootstrapping initial support for Metadata Table in Spark Datasource

- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
2022-02-24 16:23:13 -05:00
ForwardXu
521338b4d9 [HUDI-3161] Add Call Produce Command for Spark SQL (#4535) 2022-02-24 07:45:37 -08:00
yanenze
943b99775b [HUDI-3488] The flink small file list should exclude file slices with pending compaction (#4893)
# this happens when the async-compaction has been configured

Co-authored-by: yanenze <yanenze@keytop.com.cn>
2022-02-24 14:45:03 +08:00
Sivabalan Narayanan
62605be413 [HUDI-3480][HUDI-3481] Enchancements to integ test suite (#4884) 2022-02-23 15:56:35 -05:00
leesf
2a93b8efb2 [HUDI-3489] Unify config to avoid duplicate code (#4883) 2022-02-23 08:14:30 -05:00
Y Ethan Guo
4e8accc179 [HUDI-3486] Fix wrong field order for constructing HoodieMetadataColumnStats (#4875) 2022-02-23 10:27:02 +05:30
yuzhaojing
dabae80423 [HUDI-3420] Remove duplicates type in HoodieClusteringGroup.avsc (#4808)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
2022-02-23 10:49:47 +08:00