- This adds a restore plan and serializes it to restore.requested meta file in timeline. This also means that we are introducing schedule and execution phases for restore which was not present before.
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
* [HUDI-2154] Add index key field to HoodieKey
* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
* `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`;
Moved Z-index helper under `hudi.index.zorder` package
* Tidying up `ZOrderingIndexHelper`
* Fixing compilation
* Fixed index new/original table merging sequence to always prefer values from new index;
Cleaned up `HoodieSparkUtils`
* Added test for `mergeIndexSql`
* Abstracted Z-index name composition w/in `ZOrderingIndexHelper`;
* Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference
* Properly handle exceptions origination during pruning in `HoodieFileIndex`
* Make sure no errors are logged upon encountering `AnalysisException`
* Cleaned up Z-index updating sequence;
Tidying up comments, java-docs;
* Fixed Z-index to properly handle changes of the list of clustered columns
* Tidying up
* `lint`
* Suppressing `JavaDocStyle` first sentence check
* Fixed compilation
* Fixing incorrect `DecimalType` conversion
* Refactored test `TestTableLayoutOptimization`
- Added Z-index table composition test (against fixtures)
- Separated out GC test;
Tidying up
* Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON
* Scaffolded `DataTypeUtils` to do basic checks of Spark types;
Added proper compatibility checking b/w old/new index-tables
* Added test for Z-index tables merging
* Fixed import being shaded by creating internal `hudi.util` package
* Fixed packaging for `TestOptimizeTable`
* Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema
* Make sure existing Z-index table schema is sync'd to source table's one
* Fixed shaded refs
* Fixed tests
* Fixed type conversion of Parquet provided metadata values into Spark expected schemas
* Fixed `composeIndexSchema` utility to propose proper schema
* Added more tests for Z-index:
- Checking that Z-index table is built correctly
- Checking that Z-index tables are merged correctly (during update)
* Fixing source table
* Fixing tests to read from Parquet w/ proper schema
* Refactored `ParquetUtils` utility reading stats from Parquet footers
* Fixed incorrect handling of Decimals extracted from Parquet footers
* Worked around issues in javac failign to compile stream's collection
* Fixed handling of `Date` type
* Fixed handling of `DateType` to be parsed as `LocalDate`
* Updated fixture;
Make sure test loads Z-index fixture using proper schema
* Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well)
* Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe
* Tidying up
* Make sure schema is used upon reading to validate input files are in the appropriate format;
Tidying up;
* Worked around javac (1.8) inability to infer expression type properly
* Updated fixtures;
Tidying up
* Fixing compilation after rebase
* Assert clustering have in Z-order layout optimization testing
* Tidying up exception messages
* XXX
* Added test validating Z-index lookup filter correctness
* Added more test-cases;
Tidying up
* Added tests for string expressions
* Fixed incorrect Z-index filter lookup translations
* Added more test-cases
* Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling
* Added `-target:jvm-1.8` for `hudi-spark` module
* Adding more tests
* Added tests for non-indexed columns
* Properly handle non-indexed columns by falling back to a re-write of containing expression as `TrueLiteral` instead
* Fixed tests
* Removing the parquet test files and disabling corresponding tests
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
* [HUDI-2332] Add clustering and compaction in Kafka Connect Sink
* Disable validation check on instant time for compaction and adjust configs
* Add javadocs
* Add clustering and compaction config
* Fix transaction causing missing records in the target table
* Add debugging logs
* Fix kafka offset sync in participant
* Adjust how clustering and compaction are configured in kafka-connect
* Fix clustering strategy
* Remove irrelevant changes from other published PRs
* Update clustering logic and others
* Update README
* Fix test failures
* Fix indentation
* Fix clustering config
* Add JavaCustomColumnsSortPartitioner and make async compaction enabled by default
* Add test for JavaCustomColumnsSortPartitioner
* Add more changes after IDE sync
* Update README with clarification
* Fix clustering logic after rebasing
* Remove unrelated changes
* [HUDI-2101]support z-order for hudi
* Renaming some configs for consistency/simplicity.
* Minor code cleanups
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
- There are two code paths, where we are taking double locking. this was added as part of adding data table locks to update metadata table. Fixing those flows to avoid taking locks if a parent transaction already acquired a lock.
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.
- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer.
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table
Co-authored-by: Prashant Wason <pwason@uber.com>
- Inserts go into logs, hashed by Kafka and Hudi partitions
- Fixed issues with the setupKafka script
- Bumped up the default commit interval to 300 seconds
- Minor renaming
- This patch introduces rollback plan and rollback.requested instant. Rollback will be done in two phases, namely rollback plan and rollback action. In planning, we prepare the rollback plan and serialize it to rollback.requested. In the rollback action phase, we fetch details from the plan and just delete the files as per the plan. This will ensure final rollback commit metadata will contain all files that got rolled back even if rollback failed midway and retried again.
- Fixing packaging, naming of classes
- Use of log4j over slf4j for uniformity
- More follow-on fixes
- Added a version to control/coordinator events.
- Eliminated the config added to write config
- Fixed fetching of checkpoints based on table type
- Clean up of naming, code placement
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
- Rollback infers the directory structure and does rollback based on the strategy used while markers were written. "write markers type" in write config is used to determine marker strategy only for new writes.
* [HUDI-1292] Created a config to enable/disable syncing of metadata table.
- Metadata Table should only be synced from a single pipeline to prevent conflicts.
- Skip syncing metadata table for clustering and compaction
- Renamed useFileListingMetadata
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Main functions:
Support create table for hoodie.
Support CTAS.
Support Insert for hoodie. Including dynamic partition and static partition insert.
Support MergeInto for hoodie.
Support DELETE
Support UPDATE
Both support spark2 & spark3 based on DataSourceV1.
Main changes:
Add sql parser for spark2.
Add HoodieAnalysis for sql resolve and logical plan rewrite.
Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS.
In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the
HoodieWriteHandler and other related classes.
1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression.
2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into.
3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema.
Verify this pull request
Add TestCreateTable for test create hoodie tables and CTAS.
Add TestInsertTable for test insert hoodie tables.
Add TestMergeIntoTable for test merge hoodie tables.
Add TestUpdateTable for test update hoodie tables.
Add TestDeleteTable for test delete hoodie tables.
Add TestSqlStatement for test supported ddl/dml currently.
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
This is the #step 2 of RFC-24:
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal
This PR introduce a BucketAssigner that assigns bucket ID (partition
path & fileID) for each stream record.
There is no need to look up index and partition the records anymore in
the following pipeline for these records,
we actually decide the write target location before the write and each
record computes its location when the BucketAssigner receives it, thus,
the indexing is with streaming style.
Computing locations for a batch of records all at a time is resource
consuming so a pressure to the engine,
we should avoid that in streaming system.
- Syncing to metadata table, setting operation type, starting async cleaner done in preWrite()
- Fixes an issues where delete() was not starting async cleaner correctly
- Fixed tests and enabled metadata table for TestAsyncCompaction
- Introduced an internal metadata table, that stores file listings.
- metadata table is kept upto date with
- Fixed handling of CleanerPlan.
- [HUDI-842] Reduce parallelism to speed up the test.
- [HUDI-842] Implementation of CLI commands for metadata operations and lookups.
- [HUDI-842] Extend rollback metadata to include the files which have been appended to.
- [HUDI-842] Support for rollbacks in MOR Table.
- MarkerBasedRollbackStrategy needs to correctly provide the list of files for which rollback blocks were appended.
- [HUDI-842] Added unit test for rollback of partial commits (inflight but not completed yet).
- [HUDI-842] Handled the error case where metadata update succeeds but dataset commit fails.
- [HUDI-842] Schema evolution strategy for Metadata Table. Each type of metadata saved (FilesystemMetadata, ColumnIndexMetadata, etc.) will be a separate field with default null. The type of the record will identify the valid field. This way, we can grow the schema when new type of information is saved within in which still keeping it backward compatible.
- [HUDI-842] Fix non-partitioned case and speedup initial creation of metadata table.Choose only 1 partition for jsc as the number of records is low (hundreds to thousands). There is more overhead of creating large number of partitions for JavaRDD and it slows down operations like WorkloadProfile.
For the non-partitioned case, use "." as the name of the partition to prevent empty keys in HFile.
- [HUDI-842] Reworked metrics pusblishing.
- Code has been split into reader and writer side. HoodieMetadata code to be accessed by using HoodieTable.metadata() to get instance of metdata for the table.
Code is serializable to allow executors to use the functionality.
- [RFC-15] Add metrics to track the time for each file system call.
- [RFC-15] Added a distributed metrics registry for spark which can be used to collect metrics from executors. This helps create a stats dashboard which shows the metadata table improvements in real-time for production tables.
- [HUDI-1321] Created HoodieMetadataConfig to specify configuration for the metadata table. This is safer than full-fledged properties for the metadata table (like HoodieWriteConfig) as it makes burdensome to tune the metadata. With limited configuration, we can control the performance of the metadata table closely.
[HUDI-1319][RFC-15] Adding interfaces for HoodieMetadata, HoodieMetadataWriter (apache#2266)
- moved MetadataReader to HoodieBackedTableMetadata, under the HoodieTableMetadata interface
- moved MetadataWriter to HoodieBackedTableMetadataWriter, under the HoodieTableMetadataWriter
- Pulled all the metrics into HoodieMetadataMetrics
- Writer now wraps the metadata, instead of extending it
- New enum for MetadataPartitionType
- Streamlined code flow inside HoodieBackedTableMetadataWriter w.r.t initializing metadata state
- [HUDI-1319] Make async operations work with metadata table (apache#2332)
- Changes the syncing model to only move over completed instants on data timeline
- Syncing happens postCommit and on writeClient initialization
- Latest delta commit on the metadata table is sufficient as the watermark for data timeline archival
- Cleaning/Compaction use a suffix to the last instant written to metadata table, such that we keep the 1-1
- .. mapping between data and metadata timelines.
- Got rid of a lot of the complexity around checking for valid commits during open of base/log files
- Tests now use local FS, to simulate more failure scenarios
- Some failure scenarios exposed HUDI-1434, which is needed for MOR to work correctly
co-authored by: Vinoth Chandar <vinoth@apache.org>