- Generalize the hive-sync module for syncing to multiple metastores
- Added new options for datasource
- Added new command line for delta streamer
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
- use codecov flags for each module to report coverage
- parallelize CI jobs for shorter time
- add a testcase for MetricsReporterFactory (to trigger codecov comment)
- Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2
- Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit
- Replace com.databricks:spark-avro with org.apache.spark:spark-avro
- Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
- spark 2.4 onwards, spark has built in support. shading to avoid conflicts
- spark 2.3 still needs this bundled, so that dropping bundle into jars folder would work
- Redo all classes based on org.parquet only
- remove unuused dependencies like parquet-hadoop, common-configuration2
- timeline-service does not build a fat jar anymore
- Fix utilities and hadoop-mr bundles based on above
- [HUDI-172] Cleanup Maven POM/Classpath
- Fix ordering of dependencies in poms, to enable better resolution
- Idea is to place more specific ones at the top
- And place dependencies which use them below them
- [HUDI-68] : Automate demo steps on docker setup
- Move hive queries from hive cli to beeline
- Standardize on taking query input from text command files
- Deltastreamer ingest, also does hive sync in a single step
- Spark Incremental Query materialized as a derived Hive table using datasource
- Fix flakiness in HDFS spin up and output comparison
- Code cleanup around streamlining and loc reduction
- Also fixed pom to not shade some hive classs in spark, to enable hive sync