Sagar Sumit
eee96e9af3
[HUDI-3593] Restore TypedProperties and flush checksum in table config ( #5013 )
...
Create new TypedProperties while performing clustering
Add OrderedProperties and minor refactoring
Add javadoc and remove getters from OrderedProperties
2022-03-13 07:58:55 +05:30
Sivabalan Narayanan
e7bb0413af
[HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way ( #4971 )
2022-03-11 18:40:13 -05:00
wangxianghu
e8918b6c2c
[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at once ( #4969 )
2022-03-11 17:49:30 -05:00
RexAn
93277b2bcd
[HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty ( #4999 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-11 17:45:40 -05:00
Alexey Kudinkin
5d59bf67ae
[HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state ( #5015 )
2022-03-11 17:39:22 -05:00
huberylee
56cb49485d
[HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable ( #4982 )
2022-03-11 13:23:19 -08:00
wangxianghu
b00180342e
[HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor ( #5019 )
2022-03-11 15:03:42 +04:00
苏承祥
faed6996ee
[HUDI-3566] Add thread factory in BoundedInMemoryExecutor ( #4926 )
...
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-03-11 18:58:49 +08:00
Yuwei XIAO
18cdad9206
[HUDI-2999] [RFC-42] RFC for consistent hashing index ( #4326 )
...
* [HUDI-2999] rfc for consistent hashing index
* [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing
Co-authored-by: xiaoyuwei <xiaoyuwei.yw@alibaba-inc.com >
2022-03-11 14:41:01 +08:00
wangxianghu
83cff3afee
[HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema ( #4972 )
...
* [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema
* Fix case sensitivity
2022-03-11 09:30:37 +04:00
Sivabalan Narayanan
9dc6df5dca
[HUDI-3595] Fixing NULL schema provider for empty batch ( #5002 )
2022-03-10 22:52:55 -05:00
Y Ethan Guo
fa5e75068e
[HUDI-3586] Add Trino Queries in integration tests ( #4988 )
2022-03-10 21:17:32 -05:00
Sagar Sumit
4e09545be4
[HUDI-3602][DOCS] Update docker README to build multi-arch images using buildx ( #5011 )
2022-03-10 02:38:27 -08:00
Danny Chan
ec24407191
[HUDI-3581] Reorganize some clazz for hudi flink ( #4983 )
2022-03-10 15:55:15 +08:00
Alexey Kudinkin
034addaef5
[HUDI-3396] Make sure BaseFileOnlyViewRelation only reads projected columns ( #4818 )
...
NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant
This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow
Brief change log
- Introduced HoodieBaseRDD as a base for all custom RDD impls
- Extracted common fields/methods to HoodieBaseRelation
- Cleaned up and streamlined HoodieBaseFileViewOnlyRelation
- Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions
2022-03-09 21:45:25 -05:00
ForwardXu
ca0b8fccee
[MINOR] Add IT CI Test timeout option ( #5003 )
2022-03-09 18:04:36 -08:00
MrSleeping123
8859b48b2a
[HUDI-3383] Sync column comments while syncing a hive table ( #4960 )
...
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
2022-03-10 09:44:39 +08:00
wangxianghu
548000b0d6
[HUDI-3568] Introduce ChainedSchemaPostProcessor to support setting multi processors at once ( #4968 )
2022-03-09 11:16:22 +04:00
Sivabalan Narayanan
4324e874ae
[HUDI-3587] Making SupportsUpgradeDowngrade serializable ( #4991 )
2022-03-09 00:04:42 -05:00
ForwardXu
08fd80c913
[HUDI-3221] Support querying a table as of a savepoint ( #4720 )
2022-03-08 10:02:34 -08:00
Sagar Sumit
575bc63468
[HUDI-3356][HUDI-3203] HoodieData for metadata index records; BloomFilter construction from index based on the type param ( #4848 )
...
Rework of #4761
This diff introduces following changes:
- Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs.
- Metadata index init support for bloom filter and column stats partitions.
- When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type.
- Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes.
Co-authored-by: Manoj Govindassamy <manoj.govindassamy@gmail.com >
2022-03-08 10:39:04 -05:00
Raymond Xu
ed26c5265c
[HUDI-3584] Skip integ test modules by default ( #4986 )
2022-03-08 06:32:04 -08:00
ForwardXu
25385805aa
[HUDI-3574] Improve maven module configs for different spark profiles ( #4970 )
2022-03-08 01:01:05 -08:00
Danny Chan
fe53bd2dea
[HUDI-2677] Add DFS based message queue for flink writer[part3] ( #4961 )
2022-03-08 15:43:21 +08:00
Bo
b6bdb46f7f
[MINOR][HUDI-3460]Fix HoodieDataSourceITCase
...
close #4959
2022-03-08 12:18:43 +08:00
todd5167
34bc752853
[HUDI-3573] flink cleanFuntion execute clean on initialization ( #4936 )
...
For flink insert overwrite operation, do the cleaning each time before the write.
2022-03-08 11:53:54 +08:00
Sivabalan Narayanan
29040762fa
[HUDI-3576] Configuring timeline refreshes based on latest commit ( #4973 )
2022-03-07 17:01:49 -05:00
YueZhang
53826d69e4
[HUDI-2747] support set --sparkMaster for MDT cli ( #4964 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-03-07 16:57:03 -05:00
Alexey Kudinkin
a66fd40692
[HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS ( #4739 )
...
- This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size.
- To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
2022-03-07 15:38:27 -05:00
Alexey Kudinkin
f0bcee3c01
[HUDI-3561] Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize ( #4954 )
...
- Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize
2022-03-07 13:42:03 -05:00
Sivabalan Narayanan
3539578ccb
[HUDI-3213] Making commit preserve metadata to true for compaction ( #4811 )
...
* Making commit preserve metadata to true
* Fixing integ tests
* Fixing preserve commit metadata for metadata table
* fixed bootstrap tests
* temp diff
* Fixing merge handle
* renaming fallback record
* fixing build issue
* Fixing test failures
2022-03-07 18:02:05 +05:30
苏承祥
6f57bbfac4
[HUDI-3069] Improve HoodieMergedLogRecordScanner avoid putting unnecessary hoodie records ( #4932 )
...
* log scanner optimization
* payload equals switches to `=`
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-03-07 14:35:55 +08:00
wangxianghu
c9ffdc493e
[HUDI-3525] Introduce JsonkafkaSourceProcessor to support data preprocess before it is transformed to DataSet ( #4930 )
2022-03-06 15:41:01 -05:00
wangxianghu
4b471772aa
[HUDI-3520] Introduce DeleteSupportSchemaPostProcessor to support adding _hoodie_is_deleted column to schema ( #4921 )
2022-03-06 15:37:09 -05:00
Aditya Tiwari
051ad0b033
[HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas ( #4468 )
...
* Fixing Hive getSchema for RT tables
* Addressing feedback
* temp diff
* fixing tests after spark datasource read support for metadata table is merged to master
* Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader
Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com >
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-03-06 07:51:35 +05:30
Sivabalan Narayanan
6a46130037
[HUDI-2761] Fixing timeline server for repeated refreshes ( #4812 )
...
* Fixing timeline server for repeated refreshes
2022-03-05 10:04:16 +08:00
Bo Cui
0986d5a01d
[HUDI-3460] Add reader merge memory option for flink ( #4911 )
...
* flink TM memory Optimization
2022-03-04 19:29:29 +08:00
Raymond Xu
b4362fac45
[HUDI-3348] Add UT to verify HoodieRealtimeFileSplit serde ( #4951 )
2022-03-04 11:19:16 +04:00
Yuwei XIAO
f449807630
[MINOR] fix UTC timezone config ( #4950 )
2022-03-04 11:09:39 +04:00
ForwardXu
6faed3d90a
[HUDI-3161][RFC-47] Add Call Produce Command for Spark SQL ( #4607 )
2022-03-03 20:02:46 -08:00
shibei
62f534d002
[HUDI-3445] Support Clustering Command Based on Call Procedure Command for Spark SQL ( #4901 )
...
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL
Co-authored-by: shibei <huberylee.li@alibaba-inc.com >
2022-03-04 09:33:16 +08:00
RexAn
be9a264885
[HUDI-3548] Fix if user specify key "hoodie.datasource.clustering.async.enable" directly, async clustering not work ( #4905 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-03-03 19:14:07 -05:00
Danny Chan
a4ba0fff07
[HUDI-3552] Strength the NetworkUtils#getHostname by checking network interfaces first ( #4942 )
...
* In some complex network environment, the current code returns wildcard address 0.0.0.0 which is not desired.
2022-03-03 21:11:08 +08:00
Sivabalan Narayanan
876a891979
[HUDI-3544] Fixing "populate meta fields" update to metadata table ( #4941 )
...
* Fixing populateMeta fields update to metadata table
* Fix checkstyle violations
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-03-03 17:02:25 +05:30
Manoj Govindassamy
51ee5005a6
[HUDI-2973] RFC-27: Data skipping index to improve query performance ( #4728 )
...
- Updating the schema used for data skipping index
2022-03-03 15:56:22 +05:30
Pratyaksh Sharma
907e60c252
[HUDI-3264]: made schema registry urls configurable with MTDS ( #4779 )
2022-03-02 15:30:41 -05:00
liujinhui
527bd34b1c
[MINOR] RFC-38 markdown content error ( #4933 )
...
* Minor content error
* Minor content error
2022-03-02 19:40:28 +04:00
Sivabalan Narayanan
f8945eca08
[MINOR] Adding more test props to integ tests ( #4935 )
2022-03-02 08:10:43 -05:00
Danny Chan
1d57bd17c2
[minor] Cosmetic changes following HUDI-3315 ( #4934 )
2022-03-02 17:44:52 +08:00
Gary Li
10d866f083
[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer ( #4679 )
...
* Support bucket index in Flink writer
* Use record key as default index key
2022-03-02 15:14:44 +08:00