Fix dependency conflict
Fix repairs command
Implement putIfAbsent for DDB lock provider
Add upgrade step and validate while fetching configs
Validate checksum for latest table version only while fetching config
Move generateChecksum to BinaryUtil
Rebase and resolve conflict
Fix table version check
This change is addressing issues in regards to Metadata Table observing ingesting duplicated records leading to it persisting incorrect file-sizes for the files referred to in those records.
There are multiple issues that were leading to that:
- [HUDI-3322] Incorrect Rollback Plan generation: Rollback Plan generated for MOR tables was overly expansively listing all log-files with the latest base-instant as the ones that have been affected by the rollback, leading to invalid MT records being ingested referring to those.
- [HUDI-3343] Metadata Table including Uncommitted Log Files during Bootstrap: Since MT is bootstrapped at the end of the commit operation execution (after FS activity, but before committing to the timeline), it was actually incorrectly ingesting some files that were part of the intermediate state of the operation being committed.
This change will unblock Stack of PRs based off #4556
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
* [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight
* Fixing retry of pending compaction in metadata table and enhancing tests
* Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion
* Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
Fixing usages
* Added test for local FS access
* Rebase to use `FSUtils.getFs`
* Combine properties provided as a file along w/ overrides provided from the CLI
* Added helper utilities to `HoodieClusteringConfig`;
Make sure corresponding config methods fallback to defaults;
* Fixed DeltaStreamer usage to respect properly combined configuration;
Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`
* Tidying up
* `lint`
* Reverting changes to `HoodieWriteConfig`
* Tdiying up
* Fixed incorrect merge of the props
* Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`
* Fixed compilation
* Fixed compilation
* Making error -> warn logs from timeline server with concurrent writers for inconsistent state
* Fixing bad request response exception for timeline out of sync
* Addressing feedback. removed write concurrency mode depedency
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Hudi relies on custom class shading for Hbase's KeyValue.KVComparator to
avoid versioning and class loading issues. There are few places which are
still using the Hbase's comparator class directly and version upgrades
would make them obsolete. Refactoring the HoodieKVComparator and making
all HFile writer creation using the same shaded class.
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Moving HoodieKVComparator from common.bootstrap.index to common.util
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Retaining the old HoodieKVComparatorV2 for boostrap case. Adding the
new comparator as HoodieKVComparatorV2 to differentiate from the old
one.
* [HUDI-2443] Hudi KVComparator for all HFile writer usages
- Renamed HoodieKVComparatorV2 to HoodieMetadataKVComparator and moved it
under the package org.apache.hudi.metadata.
* Make comparator classname configurable
* Revert new config and address other review comments
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
- Adds support for generating commit timestamps with millisecs granularity.
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
* [HUDI-2795] Add mechanism to safely update,delete and recover table properties
- Fail safe mechanism, that lets queries succeed off a backup file
- Readers who are not upgraded to this version of code will just fail until recovery is done.
- Added unit tests that exercises all these scenarios.
- Adding CLI for recovery, updation to table command.
- [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS
* Fixing upgrade/downgrade infrastructure to use new updation method
- Metadata table today has virtual keys disabled, thereby populating the metafields
for each record written out and increasing the overall storage space used. Hereby
adding virtual keys support for metadata table so that metafields are disabled
for metadata table records.
- Adding a custom KeyGenerator for Metadata table so as to not rely on the
default Base/SimpleKeyGenerators which currently look for record key
and partition field set in the table config.
- AbstractHoodieLogRecordReader's version of processing next data block and
createHoodieRecord() will be a generic version and making the derived class
HoodieMetadataMergedLogRecordReader take care of the special creation of
records from explictly passed in partition names.
- ExternalSpillableMap does the payload/value size estimation on the first put to
determine when to spill over to disk map. The payload size re-estimation also
happens after a minimum threshold of puts. This size re-estimation goes my the
current in-memory map size for calculating average payload size and does attempts
divide by zero operation when the map is size is empty. Avoiding the
ArithmeticException during the payload size re-estimate by checking the map size
upfront.