1
0

Adding hoodie-spark to support Spark Datasource for Hoodie

- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
This commit is contained in:
Vinoth Chandar
2017-08-28 01:28:08 -07:00
committed by vinoth chandar
parent c98ee057fc
commit 64e0573aca
44 changed files with 1830 additions and 331 deletions

View File

@@ -33,6 +33,7 @@
<module>hoodie-hadoop-mr</module>
<module>hoodie-hive</module>
<module>hoodie-utilities</module>
<module>hoodie-spark</module>
</modules>
<licenses>
@@ -122,6 +123,8 @@
<hive.version>1.1.0</hive.version>
<metrics.version>3.1.1</metrics.version>
<spark.version>2.1.0</spark.version>
<scala.version>2.11.8</scala.version>
<scala.libversion>2.11</scala.libversion>
</properties>
<scm>
@@ -436,13 +439,13 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
@@ -512,7 +515,7 @@
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-configuration2</artifactId>
<version>2.1</version>
<version>2.1.1</version>
</dependency>
<dependency>