Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
- This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size.
- To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
* Bootstrapping initial support for Metadata Table in Spark Datasource
- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
Fix dependency conflict
Fix repairs command
Implement putIfAbsent for DDB lock provider
Add upgrade step and validate while fetching configs
Validate checksum for latest table version only while fetching config
Move generateChecksum to BinaryUtil
Rebase and resolve conflict
Fix table version check
This change is addressing issues in regards to Metadata Table observing ingesting duplicated records leading to it persisting incorrect file-sizes for the files referred to in those records.
There are multiple issues that were leading to that:
- [HUDI-3322] Incorrect Rollback Plan generation: Rollback Plan generated for MOR tables was overly expansively listing all log-files with the latest base-instant as the ones that have been affected by the rollback, leading to invalid MT records being ingested referring to those.
- [HUDI-3343] Metadata Table including Uncommitted Log Files during Bootstrap: Since MT is bootstrapped at the end of the commit operation execution (after FS activity, but before committing to the timeline), it was actually incorrectly ingesting some files that were part of the intermediate state of the operation being committed.
This change will unblock Stack of PRs based off #4556
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
* [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight
* Fixing retry of pending compaction in metadata table and enhancing tests
* Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion
* Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
Fixing usages
* Added test for local FS access
* Rebase to use `FSUtils.getFs`
* Combine properties provided as a file along w/ overrides provided from the CLI
* Added helper utilities to `HoodieClusteringConfig`;
Make sure corresponding config methods fallback to defaults;
* Fixed DeltaStreamer usage to respect properly combined configuration;
Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`
* Tidying up
* `lint`
* Reverting changes to `HoodieWriteConfig`
* Tdiying up
* Fixed incorrect merge of the props
* Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`
* Fixed compilation
* Fixed compilation
* Making error -> warn logs from timeline server with concurrent writers for inconsistent state
* Fixing bad request response exception for timeline out of sync
* Addressing feedback. removed write concurrency mode depedency
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Hudi relies on custom class shading for Hbase's KeyValue.KVComparator to
avoid versioning and class loading issues. There are few places which are
still using the Hbase's comparator class directly and version upgrades
would make them obsolete. Refactoring the HoodieKVComparator and making
all HFile writer creation using the same shaded class.
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Moving HoodieKVComparator from common.bootstrap.index to common.util
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Retaining the old HoodieKVComparatorV2 for boostrap case. Adding the
new comparator as HoodieKVComparatorV2 to differentiate from the old
one.
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Renamed HoodieKVComparatorV2 to HoodieMetadataKVComparator and moved it
under the package org.apache.hudi.metadata.
* Make comparator classname configurable
* Revert new config and address other review comments
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
- Adds support for generating commit timestamps with millisecs granularity.
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
* [HUDI-2795] Add mechanism to safely update,delete and recover table properties
- Fail safe mechanism, that lets queries succeed off a backup file
- Readers who are not upgraded to this version of code will just fail until recovery is done.
- Added unit tests that exercises all these scenarios.
- Adding CLI for recovery, updation to table command.
- [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS
* Fixing upgrade/downgrade infrastructure to use new updation method