Improving out of box experience for data source

- Fixes #246 - Bump up default parallelism to 1500, to handle large upserts - Add docs on s3 confuration & tuning tips with tested spark knobs - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset - Improve speed of ROTablePathFilter by removing directory check - Move to spark-avro 4.0 to handle issue with nested fields with same name - Keep AvroConversionUtils in sync with spark-avro 4.0
2018-01-05 14:06:18 -08:00
parent a97814462d
commit 85dd265b7b
8 changed files with 112 additions and 19 deletions
--- a/hoodie-spark/pom.xml
+++ b/hoodie-spark/pom.xml
@@ -142,7 +142,7 @@
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-avro_2.11</artifactId>
-      <version>3.2.0</version>
+      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>