The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
- LogVersion is associated with a LogBlock not a LogFile
- Based on a version for a LogBlock, approporiate code path is executed
- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
- Implemented Reverse pointer to be able to traverse the log in reverse
- Introduce new MAGIC for backwards compatibility with logs without versions
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
- clean/rollback/write paths covered by existing tests
- Snapshot copier fixed to copy metadata file also, and test fixed
- Existing tables need to be repaired by addition of metadata, before this can be rolled out
- Concurreny handled via taskID, failure recovery handled via renames
- Falls back to search 3 levels up
- Cli tool has command to add this to existing tables
- Introduce avro to save clean metadata with details about the last commit that was retained
- Save rollback metadata in the meta timeline
- Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint
- Savepointed commit should not be rolledback or cleaned or archived
- introduce cli commands to show, create and rollback to savepoints
- Write unit tests to test savepoints and rollbackToSavepoints
1. Create HoodieTable abstraction for commits and fileSystemView
2. HoodieMergeOnReadTable created
3. View is now always obtained from the table and the correct view based on the table type is returned
- Refactored timelines to be a single timeline for all active events and one for archived events. CommitTimeline and other timelines can be inferred by applying a filter on the activeTimelime
- Introduced HoodieInstant to abstract different types of action, commit time and if isInFlight
- Implemented other review comments