- When append() is not supported, rollover to new file always (instead of failing)
- Provide way to configure archive log folder (avoids small files inside .hoodie)
- Datasets written via Spark datasource archive to .hoodie/archived
- HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles
- Few tweaks to code structure around CommitArchiveLog
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
- keys compared lexicographically using String::compareTo
- Range metadata additionally written into parquet file footers
- Trim fat & few optimizations to speed up indexing
- Add param to control whether input shall be cached, to speed up lookup
- Add param to turn on/off range pruning
- Auto compute of parallelism now simply factors in amount of comparisons done
- More accurate parallelism computation when range pruning is on
- tests added & hardened, docs updated
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit