- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
- Brings more order and cohesion to the classes in hudi-common
- Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
- common.fs package now contains all the filesystem level classes including wrapper filesystem
- bloom.filter package renamed to just bloom
- config package contains classes that help store properties
- common.fs.inline package contains all the inline filesystem classes/impl
- common.table.timeline now consolidates all timeline related classes
- common.table.view consolidates all the classes related to filesystem view metadata
- common.table.timeline.versioning contains all classes related to versioning of timeline
- Fix few unit tests as a result
- Moved the test packages around to match the source file move
- Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
- Storage Type replaced with Table Type (remaining instances)
- View types replaced with query types;
- ReadOptimized view referred as Snapshot Query
- TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
- HoodieDataFile renamed to HoodieBaseFile
- Hive Sync tool will register RO tables for MOR with a `_ro` suffix
- Datasource/Deltastreamer options renamed accordingly
- Support fallback to old config values as well, so migration is painless
- Config for controlling _ro suffix addition
- Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
- Docs were talking about storage types before, cWiki moved to "Table"
- Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
- Replacing renaming use of dataset across code/comments
- Few usages in comments and use of Spark SQL DataSet remain unscathed
- Add a transformer class, that adds `Op` fiels if not found in input frame
- Add a payload implementation, that issues deletes when Op=D
- Remove Parquet as a top level source type, consolidate with RowSource
- Made delta streamer work without a property file, simply using overridden cli options
- Unit tests for transformer/payload classes
- Introduced configs for bloom filter type
- Implemented dynamic bloom filter with configurable max number of keys
- BloomFilterFactory abstractions; Defaults to current simple bloom filter