* [HUDI-2763] Metadata table records - support for key deduplication and virtual keys
- The backing log format for the metadata table is HFile, a KeyValue type.
Since the key field in the metadata record payload is a duplicate of the
Key in the Cell, the redundant key field in the record can be emptied
to save on the cost.
- HoodieHFileWriter and HoodieHFileDataBlock will now serialize records
with the key field emptied by default. HFile writer tries to find if
the record has metadata payload schema field 'key' and if so it does
the key trimming from the record payload.
- HoodieHFileReader when reading the serialized records back from disk,
it materializes the missing keyFields if any. HFile reader tries to
find if the record has metadata payload schema fiels 'key' and if so
it does the key materialization in the record payload.
- Tests have been added to verify the default virtual keys and key
deduplication support for the metadata table records.
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Making InProcessLockProvider as the default lock provider when
any async services are enabled and when no lock provider is
explicitly set.
- This is the workaround for metadata table updates racing with
async table serice operations
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Renaming isAnyTableServicesInline/Async() to areAnyTableServicesInline/Async()
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Additionally checking for write config properties when verifying
the lock provider override. Updated the unit test for this case.
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
- HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
are not thread safe. When multiple queries come in for gettting log records
by keys, they all operate on the same log record reader instance provided by
HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
as they clear/put/get the same class memeber records.
- The fix is to streamline the mutatation to class member records. Making
HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
to avoid concurrent log records readers getting into NPE.
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org>