- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
- Designed to be run by your workflow manager after hoodie upsert
- Assumes jdbc connectivity via HiveServer2, which should work with all major distros
Test logs > 4MB, which is a limit for travis-ci. Reducing the logs by setting appropriate log levels for tests
Add sudo: required on travis.yml to get more memory for running the tests. (https://github.com/travis-ci/travis-ci/issues/5926)
Fixed requirement that fsclient.lastDataFileForDataset always returns files in order