1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
- Standardize version of jackson
- DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
- Remove dependency on ConstructorUtils
- Throw error if ordering value is not present, during key generation
- Switch to shade plugin for hoodie-utilities
- Added support for consumption for Confluent avro kafka serdes
- Support for Confluent schema registry
- KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
- Added support for BULK_INSERT operations as well
- Pass in the payload class config properly into HoodieWriteClient
- Fix documentation based on new usage
- Adding tests on deltastreamer, sources and all new util classes.
- Fixes#246
- Bump up default parallelism to 1500, to handle large upserts
- Add docs on s3 confuration & tuning tips with tested spark knobs
- Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
- Improve speed of ROTablePathFilter by removing directory check
- Move to spark-avro 4.0 to handle issue with nested fields with same name
- Keep AvroConversionUtils in sync with spark-avro 4.0
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API