- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
* Add analytics to site
* Fix ugly favicon
* New & Improved HoodieDeltaStreamer
- Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
- Supports Json/Avro data, Source can also do custom things
- Source is totally pluggable, via reflection
- Key generation is pluggable, currently added SimpleKeyGenerator
- Schema provider is pluggable, currently Filebased schemas
- Configurable field to break ties during preCombine
- Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
- Handles efficient avro serialization in Spark
Pending :
- Rewriting of HiveIncrPullSource
- Hive sync via hoodie-hive
- Cleanup & tests
* Minor fixes from master rebase
* Implementation of HiveIncrPullSource
- Copies commit by commit from source to target
* Adding TimestampBasedKeyGenerator
- Supports unix time & date strings