HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.
Code changes:
1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.
Testing changes:
4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
When using HiveDriver mode in HudiHiveClient, Hive 2.x DDL operations like ALTER PARTITION may fail. This is because Hive 2.x doesn't like `db`.`table_name` for operations. In this fix, we set the name of the database in the SessionState create for the Driver.
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
- Brings more order and cohesion to the classes in hudi-common
- Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
- common.fs package now contains all the filesystem level classes including wrapper filesystem
- bloom.filter package renamed to just bloom
- config package contains classes that help store properties
- common.fs.inline package contains all the inline filesystem classes/impl
- common.table.timeline now consolidates all timeline related classes
- common.table.view consolidates all the classes related to filesystem view metadata
- common.table.timeline.versioning contains all classes related to versioning of timeline
- Fix few unit tests as a result
- Moved the test packages around to match the source file move
- Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos