* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Making InProcessLockProvider as the default lock provider when
any async services are enabled and when no lock provider is
explicitly set.
- This is the workaround for metadata table updates racing with
async table serice operations
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Renaming isAnyTableServicesInline/Async() to areAnyTableServicesInline/Async()
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Additionally checking for write config properties when verifying
the lock provider override. Updated the unit test for this case.
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
- HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
are not thread safe. When multiple queries come in for gettting log records
by keys, they all operate on the same log record reader instance provided by
HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
as they clear/put/get the same class memeber records.
- The fix is to streamline the mutatation to class member records. Making
HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
to avoid concurrent log records readers getting into NPE.
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
* [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight
* Fixing retry of pending compaction in metadata table and enhancing tests
- Fetching partition files or all partitions from the metadata table is failing
when run over S3. Metadata table uses HFile format for the base files and the
record lookup uses HFile.Reader and HFileScanner interfaces to get records by
partition keys. When the backing storage is S3, this record lookup from HFiles
is failing with IOException, in turn failing the caller commit/update operations.
- Metadata table looks up HFile records with positional read enabled so as to
perform better for random lookups. But this positional read key lookup is
returning with partial read sizes over S3 leading to HFile scanner throwing
IOException. This doesn't happen over HDFS. Metadata table though uses the HFile
for random key lookups, the positional read is not mandatory as we sort the keys
when doing a lookup for multiple keys.
- The fix is to disable HFile positional read for all HFile scanner based
key lookups.