1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
- Standardize version of jackson
- DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
- Remove dependency on ConstructorUtils
- Throw error if ordering value is not present, during key generation
- Switch to shade plugin for hoodie-utilities
- Added support for consumption for Confluent avro kafka serdes
- Support for Confluent schema registry
- KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
- Added support for BULK_INSERT operations as well
- Pass in the payload class config properly into HoodieWriteClient
- Fix documentation based on new usage
- Adding tests on deltastreamer, sources and all new util classes.
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
- LogVersion is associated with a LogBlock not a LogFile
- Based on a version for a LogBlock, approporiate code path is executed
- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
- Implemented Reverse pointer to be able to traverse the log in reverse
- Introduce new MAGIC for backwards compatibility with logs without versions
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf