- Generalized BloomIndex to work with file ids instead of paths
- Abstracted away Bloom filter checking into HoodieLookupHandle
- Abstracted away range information retrieval into HoodieRangeInfoHandle
- For implicit indexes (e.g BloomIndex), don't buffer up written records
- By default, only collect 10% of failing records to avoid OOMs
- Improves debuggability via above, since data errors can now show up in collect()
- Unit tests & fixing subclasses & adjusting tests
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style