Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
When using HiveDriver mode in HudiHiveClient, Hive 2.x DDL operations like ALTER PARTITION may fail. This is because Hive 2.x doesn't like `db`.`table_name` for operations. In this fix, we set the name of the database in the SessionState create for the Driver.
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
1. getRecordsUsingInputFormat() can take a custom Configuration which can be used to specify HUDI table properties (e.g. <table>.consume.mode or <table>.consume.start.timestamp)
2. Fixed the return to return an empty List rather than raise an Exception if no records are found
- Brings more order and cohesion to the classes in hudi-common
- Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
- common.fs package now contains all the filesystem level classes including wrapper filesystem
- bloom.filter package renamed to just bloom
- config package contains classes that help store properties
- common.fs.inline package contains all the inline filesystem classes/impl
- common.table.timeline now consolidates all timeline related classes
- common.table.view consolidates all the classes related to filesystem view metadata
- common.table.timeline.versioning contains all classes related to versioning of timeline
- Fix few unit tests as a result
- Moved the test packages around to match the source file move
- Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
* Refactor exporter main logic
* break main method into multiple readable methods
* fix bug of passing wrong file list
* avoid deleting output path when exists
* throw exception to early abort on multiple cases
* use JavaSparkContext instead of SparkSession
* improve unit test for expected exceptions