Alexey Kudinkin
f0bcee3c01
[HUDI-3561] Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize ( #4954 )
...
- Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize
2022-03-07 13:42:03 -05:00
Sivabalan Narayanan
3539578ccb
[HUDI-3213] Making commit preserve metadata to true for compaction ( #4811 )
...
* Making commit preserve metadata to true
* Fixing integ tests
* Fixing preserve commit metadata for metadata table
* fixed bootstrap tests
* temp diff
* Fixing merge handle
* renaming fallback record
* fixing build issue
* Fixing test failures
2022-03-07 18:02:05 +05:30
苏承祥
6f57bbfac4
[HUDI-3069] Improve HoodieMergedLogRecordScanner avoid putting unnecessary hoodie records ( #4932 )
...
* log scanner optimization
* payload equals switches to `=`
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-03-07 14:35:55 +08:00
wangxianghu
c9ffdc493e
[HUDI-3525] Introduce JsonkafkaSourceProcessor to support data preprocess before it is transformed to DataSet ( #4930 )
2022-03-06 15:41:01 -05:00
wangxianghu
4b471772aa
[HUDI-3520] Introduce DeleteSupportSchemaPostProcessor to support adding _hoodie_is_deleted column to schema ( #4921 )
2022-03-06 15:37:09 -05:00
Aditya Tiwari
051ad0b033
[HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas ( #4468 )
...
* Fixing Hive getSchema for RT tables
* Addressing feedback
* temp diff
* fixing tests after spark datasource read support for metadata table is merged to master
* Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader
Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com >
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-03-06 07:51:35 +05:30
Sivabalan Narayanan
6a46130037
[HUDI-2761] Fixing timeline server for repeated refreshes ( #4812 )
...
* Fixing timeline server for repeated refreshes
2022-03-05 10:04:16 +08:00
Bo Cui
0986d5a01d
[HUDI-3460] Add reader merge memory option for flink ( #4911 )
...
* flink TM memory Optimization
2022-03-04 19:29:29 +08:00
Raymond Xu
b4362fac45
[HUDI-3348] Add UT to verify HoodieRealtimeFileSplit serde ( #4951 )
2022-03-04 11:19:16 +04:00
Yuwei XIAO
f449807630
[MINOR] fix UTC timezone config ( #4950 )
2022-03-04 11:09:39 +04:00
ForwardXu
6faed3d90a
[HUDI-3161][RFC-47] Add Call Produce Command for Spark SQL ( #4607 )
2022-03-03 20:02:46 -08:00
shibei
62f534d002
[HUDI-3445] Support Clustering Command Based on Call Procedure Command for Spark SQL ( #4901 )
...
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
Co-authored-by: shibei <huberylee.li@alibaba-inc.com >
2022-03-04 09:33:16 +08:00
RexAn
be9a264885
[HUDI-3548] Fix if user specify key "hoodie.datasource.clustering.async.enable" directly, async clustering not work ( #4905 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-03 19:14:07 -05:00
Danny Chan
a4ba0fff07
[HUDI-3552] Strength the NetworkUtils#getHostname by checking network interfaces first ( #4942 )
...
* In some complex network environment, the current code returns wildcard address 0.0.0.0 which is not desired.
2022-03-03 21:11:08 +08:00
Sivabalan Narayanan
876a891979
[HUDI-3544] Fixing "populate meta fields" update to metadata table ( #4941 )
...
* Fixing populateMeta fields update to metadata table
* Fix checkstyle violations
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-03-03 17:02:25 +05:30
Manoj Govindassamy
51ee5005a6
[HUDI-2973] RFC-27: Data skipping index to improve query performance ( #4728 )
...
- Updating the schema used for data skipping index
2022-03-03 15:56:22 +05:30
Pratyaksh Sharma
907e60c252
[HUDI-3264]: made schema registry urls configurable with MTDS ( #4779 )
2022-03-02 15:30:41 -05:00
liujinhui
527bd34b1c
[MINOR] RFC-38 markdown content error ( #4933 )
...
* Minor content error
* Minor content error
2022-03-02 19:40:28 +04:00
Sivabalan Narayanan
f8945eca08
[MINOR] Adding more test props to integ tests ( #4935 )
2022-03-02 08:10:43 -05:00
Danny Chan
1d57bd17c2
[minor] Cosmetic changes following HUDI-3315 ( #4934 )
2022-03-02 17:44:52 +08:00
Gary Li
10d866f083
[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer ( #4679 )
...
* Support bucket index in Flink writer
* Use record key as default index key
2022-03-02 15:14:44 +08:00
Alexey Kudinkin
85f47b53df
[HUDI-3469] Refactor HoodieTestDataGenerator to provide for reproducible Builds ( #4866 )
2022-03-01 22:15:26 -08:00
yuzhaojing
3b2da9f138
[HUDI-2631] In CompactFunction, set up the write schema each time with the latest schema ( #4000 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-03-02 11:18:17 +08:00
stayrascal
3cfb52c413
[MINOR] fix get builtin function issue from Hudi catalog ( #4917 )
2022-03-02 11:16:19 +08:00
Bo Cui
3fdc9332e5
[HUDI-3516] Implement record iterator for HoodieDataBlock ( #4909 )
...
* Use iterator to void eager materialization to be memory friendly
2022-03-02 10:19:36 +08:00
ForwardXu
a81a6326d5
[HUDI-3441] Add support for "marker delete" in hudi-cli ( #4922 )
2022-03-01 16:03:53 +08:00
Sivabalan Narayanan
f7088a957c
[HUDI-3497] Adding Datatable validator tool ( #4902 )
2022-02-28 22:46:32 -05:00
Y Ethan Guo
257052a94d
[HUDI-3465] Add validation of column stats and bloom filters in HoodieMetadataTableValidator ( #4878 )
2022-02-28 18:49:30 -08:00
yuzhaojing
44b8ab6048
[HUDI-3418] Save timeout option for remote RemoteFileSystemView ( #4809 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-02-28 15:16:40 -05:00
wenningd
18dc89cf79
[HUDI-3450] Avoid passing empty string spark master to hudi cli ( #4844 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-02-28 11:37:24 -05:00
Y Ethan Guo
05e395ae5f
[HUDI-3341] Fix log file reader for S3 with hadoop-aws 2.7.x ( #4897 )
2022-02-28 11:14:35 -05:00
stayrascal
8f1e4f5b3e
[HUDI-3528] Fix String convert issue and overwrite putAll method in TypedProperties.java ( #4920 )
2022-02-28 10:45:47 -05:00
Sivabalan Narayanan
4a59876c8b
[HUDI-2917] rollback insert data appended to log file when using Hbase Index ( #4840 )
...
Co-authored-by: guanziyue <guanziyue@gmail.com >
2022-02-28 08:13:17 -05:00
Bo Cui
193215201c
[MINOR] Change MINI_BATCH_SIZE to 2048 ( #4862 )
...
ParquetColumnarRowSplitReader#batchSize is 2048, so Changing MINI_BATCH_SIZE to 2048 will reduce memory cache.
2022-02-28 10:45:28 +08:00
Sivabalan Narayanan
d5444ff7ff
[HUDI-3018] Adding validation to dataframe scheme to ensure reserved field does not have diff data type ( #4852 )
2022-02-27 11:59:23 -05:00
Sivabalan Narayanan
2f99e8458a
[HUDI-3521] Fixing kakfa key and value serializer value type from class to string ( #4919 )
2022-02-27 11:13:13 -05:00
Raymond Xu
c77b2591d0
[HUDI-2439] Remove SparkBoundedInMemoryExecutor ( #4860 )
2022-02-26 08:02:12 -05:00
Sivabalan Narayanan
1379300b5b
[HUDI-3483] Adding insert override nodes to integ test suite and few clean ups ( #4895 )
2022-02-26 08:00:15 -05:00
Sagar Sumit
6a5cfb45b9
[MINOR] Fix table type in input format test ( #4912 )
2022-02-25 13:51:53 -05:00
苏承祥
92cdc5987a
[HUDI-3515] Making rdd unpersist optional at the end of writes ( #4898 )
...
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-02-25 11:30:10 -05:00
Raymond Xu
b50f4b491c
[HUDI-3042] Refactor clustering executors ( #4847 )
2022-02-25 05:39:43 -08:00
YueZhang
742810070b
[HUDI-3421]Pending clustering may break AbstractTableFileSystemView#getxxBaseFile() ( #4810 )
2022-02-25 16:46:27 +05:30
Danny Chan
a4ee7463ae
[HUDI-3474] Add more document to Pipelines for the usage of this tool to build a write pipeline ( #4906 )
2022-02-25 19:08:51 +08:00
todd5167
45d1216e91
[HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName validation ( #4774 )
2022-02-24 23:31:29 -05:00
YueZhang
3694485609
[HUDI-3429] Support clustering scheduleAndExecute for hudi-cli and add clustering-cli Tests ( #4817 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-24 23:28:38 -05:00
ForwardXu
aa1810d737
[HUDI-3493] Not table to get execution plan ( #4894 )
2022-02-24 17:04:44 -08:00
Alexey Kudinkin
85e8a5c4de
[HUDI-1296] Support Metadata Table in Spark Datasource ( #4789 )
...
* Bootstrapping initial support for Metadata Table in Spark Datasource
- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
2022-02-24 16:23:13 -05:00
ForwardXu
521338b4d9
[HUDI-3161] Add Call Produce Command for Spark SQL ( #4535 )
2022-02-24 07:45:37 -08:00
yanenze
943b99775b
[HUDI-3488] The flink small file list should exclude file slices with pending compaction ( #4893 )
...
# this happens when the async-compaction has been configured
Co-authored-by: yanenze <yanenze@keytop.com.cn >
2022-02-24 14:45:03 +08:00
Sivabalan Narayanan
62605be413
[HUDI-3480][HUDI-3481] Enchancements to integ test suite ( #4884 )
2022-02-23 15:56:35 -05:00