- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
Create new TypedProperties while performing clustering
Add OrderedProperties and minor refactoring
Add javadoc and remove getters from OrderedProperties
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
- This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size.
- To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
* Bootstrapping initial support for Metadata Table in Spark Datasource
- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
Fix dependency conflict
Fix repairs command
Implement putIfAbsent for DDB lock provider
Add upgrade step and validate while fetching configs
Validate checksum for latest table version only while fetching config
Move generateChecksum to BinaryUtil
Rebase and resolve conflict
Fix table version check
This change is addressing issues in regards to Metadata Table observing ingesting duplicated records leading to it persisting incorrect file-sizes for the files referred to in those records.
There are multiple issues that were leading to that:
- [HUDI-3322] Incorrect Rollback Plan generation: Rollback Plan generated for MOR tables was overly expansively listing all log-files with the latest base-instant as the ones that have been affected by the rollback, leading to invalid MT records being ingested referring to those.
- [HUDI-3343] Metadata Table including Uncommitted Log Files during Bootstrap: Since MT is bootstrapped at the end of the commit operation execution (after FS activity, but before committing to the timeline), it was actually incorrectly ingesting some files that were part of the intermediate state of the operation being committed.
This change will unblock Stack of PRs based off #4556
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
* [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight
* Fixing retry of pending compaction in metadata table and enhancing tests
* Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion
* Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
Fixing usages
* Added test for local FS access
* Rebase to use `FSUtils.getFs`
* Combine properties provided as a file along w/ overrides provided from the CLI
* Added helper utilities to `HoodieClusteringConfig`;
Make sure corresponding config methods fallback to defaults;
* Fixed DeltaStreamer usage to respect properly combined configuration;
Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`
* Tidying up
* `lint`
* Reverting changes to `HoodieWriteConfig`
* Tdiying up
* Fixed incorrect merge of the props
* Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`
* Fixed compilation
* Fixed compilation
* Making error -> warn logs from timeline server with concurrent writers for inconsistent state
* Fixing bad request response exception for timeline out of sync
* Addressing feedback. removed write concurrency mode depedency