1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
- spark 2.4 onwards, spark has built in support. shading to avoid conflicts
- spark 2.3 still needs this bundled, so that dropping bundle into jars folder would work
- Documented principles applied for redesign at packaging/README.md
- No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred
- Introduce new FileIOUtils & added checkstyle rule for illegal import of above
- Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead
- Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere
- Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles
- 6.5x reduced number of classes across bundles
- Redo all classes based on org.parquet only
- remove unuused dependencies like parquet-hadoop, common-configuration2
- timeline-service does not build a fat jar anymore
- Fix utilities and hadoop-mr bundles based on above
- [HUDI-172] Cleanup Maven POM/Classpath
- Fix ordering of dependencies in poms, to enable better resolution
- Idea is to place more specific ones at the top
- And place dependencies which use them below them
- [HUDI-68] : Automate demo steps on docker setup
- Move hive queries from hive cli to beeline
- Standardize on taking query input from text command files
- Deltastreamer ingest, also does hive sync in a single step
- Spark Incremental Query materialized as a derived Hive table using datasource
- Fix flakiness in HDFS spin up and output comparison
- Code cleanup around streamlining and loc reduction
- Also fixed pom to not shade some hive classs in spark, to enable hive sync