* `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`; Moved Z-index helper under `hudi.index.zorder` package * Tidying up `ZOrderingIndexHelper` * Fixing compilation * Fixed index new/original table merging sequence to always prefer values from new index; Cleaned up `HoodieSparkUtils` * Added test for `mergeIndexSql` * Abstracted Z-index name composition w/in `ZOrderingIndexHelper`; * Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference * Properly handle exceptions origination during pruning in `HoodieFileIndex` * Make sure no errors are logged upon encountering `AnalysisException` * Cleaned up Z-index updating sequence; Tidying up comments, java-docs; * Fixed Z-index to properly handle changes of the list of clustered columns * Tidying up * `lint` * Suppressing `JavaDocStyle` first sentence check * Fixed compilation * Fixing incorrect `DecimalType` conversion * Refactored test `TestTableLayoutOptimization` - Added Z-index table composition test (against fixtures) - Separated out GC test; Tidying up * Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON * Scaffolded `DataTypeUtils` to do basic checks of Spark types; Added proper compatibility checking b/w old/new index-tables * Added test for Z-index tables merging * Fixed import being shaded by creating internal `hudi.util` package * Fixed packaging for `TestOptimizeTable` * Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema * Make sure existing Z-index table schema is sync'd to source table's one * Fixed shaded refs * Fixed tests * Fixed type conversion of Parquet provided metadata values into Spark expected schemas * Fixed `composeIndexSchema` utility to propose proper schema * Added more tests for Z-index: - Checking that Z-index table is built correctly - Checking that Z-index tables are merged correctly (during update) * Fixing source table * Fixing tests to read from Parquet w/ proper schema * Refactored `ParquetUtils` utility reading stats from Parquet footers * Fixed incorrect handling of Decimals extracted from Parquet footers * Worked around issues in javac failign to compile stream's collection * Fixed handling of `Date` type * Fixed handling of `DateType` to be parsed as `LocalDate` * Updated fixture; Make sure test loads Z-index fixture using proper schema * Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well) * Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe * Tidying up * Make sure schema is used upon reading to validate input files are in the appropriate format; Tidying up; * Worked around javac (1.8) inability to infer expression type properly * Updated fixtures; Tidying up * Fixing compilation after rebase * Assert clustering have in Z-order layout optimization testing * Tidying up exception messages * XXX * Added test validating Z-index lookup filter correctness * Added more test-cases; Tidying up * Added tests for string expressions * Fixed incorrect Z-index filter lookup translations * Added more test-cases * Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling * Added `-target:jvm-1.8` for `hudi-spark` module * Adding more tests * Added tests for non-indexed columns * Properly handle non-indexed columns by falling back to a re-write of containing expression as `TrueLiteral` instead * Fixed tests * Removing the parquet test files and disabling corresponding tests Co-authored-by: Vinoth Chandar <vinoth@apache.org>
537 lines
16 KiB
XML
537 lines
16 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
<parent>
|
|
<artifactId>hudi-spark-datasource</artifactId>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<version>0.10.0-SNAPSHOT</version>
|
|
</parent>
|
|
<modelVersion>4.0.0</modelVersion>
|
|
|
|
<artifactId>hudi-spark_${scala.binary.version}</artifactId>
|
|
<version>0.10.0-SNAPSHOT</version>
|
|
|
|
<name>hudi-spark_${scala.binary.version}</name>
|
|
<packaging>jar</packaging>
|
|
|
|
<properties>
|
|
<main.basedir>${project.parent.parent.basedir}</main.basedir>
|
|
</properties>
|
|
|
|
<build>
|
|
<resources>
|
|
<resource>
|
|
<directory>src/main/resources</directory>
|
|
</resource>
|
|
</resources>
|
|
<pluginManagement>
|
|
<plugins>
|
|
<plugin>
|
|
<groupId>net.alchim31.maven</groupId>
|
|
<artifactId>scala-maven-plugin</artifactId>
|
|
<version>${scala-maven-plugin.version}</version>
|
|
<configuration>
|
|
<args>
|
|
<arg>-nobootcp</arg>
|
|
<arg>-target:jvm-1.8</arg>
|
|
</args>
|
|
<checkMultipleScalaVersions>false</checkMultipleScalaVersions>
|
|
</configuration>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.apache.maven.plugins</groupId>
|
|
<artifactId>maven-compiler-plugin</artifactId>
|
|
</plugin>
|
|
</plugins>
|
|
</pluginManagement>
|
|
|
|
<plugins>
|
|
<plugin>
|
|
<groupId>org.apache.maven.plugins</groupId>
|
|
<artifactId>maven-dependency-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>copy-dependencies</id>
|
|
<phase>prepare-package</phase>
|
|
<goals>
|
|
<goal>copy-dependencies</goal>
|
|
</goals>
|
|
<configuration>
|
|
<outputDirectory>${project.build.directory}/lib</outputDirectory>
|
|
<overWriteReleases>true</overWriteReleases>
|
|
<overWriteSnapshots>true</overWriteSnapshots>
|
|
<overWriteIfNewer>true</overWriteIfNewer>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>net.alchim31.maven</groupId>
|
|
<artifactId>scala-maven-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>scala-compile-first</id>
|
|
<phase>process-resources</phase>
|
|
<goals>
|
|
<goal>add-source</goal>
|
|
<goal>compile</goal>
|
|
</goals>
|
|
</execution>
|
|
<execution>
|
|
<id>scala-test-compile</id>
|
|
<phase>process-test-resources</phase>
|
|
<goals>
|
|
<goal>testCompile</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.apache.maven.plugins</groupId>
|
|
<artifactId>maven-compiler-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<phase>compile</phase>
|
|
<goals>
|
|
<goal>compile</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.apache.maven.plugins</groupId>
|
|
<artifactId>maven-jar-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<goals>
|
|
<goal>test-jar</goal>
|
|
</goals>
|
|
<phase>test-compile</phase>
|
|
</execution>
|
|
</executions>
|
|
<configuration>
|
|
<skip>false</skip>
|
|
</configuration>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.apache.rat</groupId>
|
|
<artifactId>apache-rat-plugin</artifactId>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.scalatest</groupId>
|
|
<artifactId>scalatest-maven-plugin</artifactId>
|
|
<version>1.0</version>
|
|
<configuration>
|
|
<skipTests>${skipUTs}</skipTests>
|
|
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
|
|
<junitxml>.</junitxml>
|
|
<filereports>TestSuite.txt</filereports>
|
|
</configuration>
|
|
<executions>
|
|
<execution>
|
|
<id>test</id>
|
|
<goals>
|
|
<goal>test</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.scalastyle</groupId>
|
|
<artifactId>scalastyle-maven-plugin</artifactId>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.jacoco</groupId>
|
|
<artifactId>jacoco-maven-plugin</artifactId>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.antlr</groupId>
|
|
<artifactId>antlr4-maven-plugin</artifactId>
|
|
<version>${antlr.version}</version>
|
|
<executions>
|
|
<execution>
|
|
<goals>
|
|
<goal>antlr4</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
<configuration>
|
|
<visitor>true</visitor>
|
|
<listener>true</listener>
|
|
<sourceDirectory>../hudi-spark/src/main/antlr4/</sourceDirectory>
|
|
</configuration>
|
|
</plugin>
|
|
</plugins>
|
|
</build>
|
|
|
|
<dependencies>
|
|
<!-- Scala -->
|
|
<dependency>
|
|
<groupId>org.scala-lang</groupId>
|
|
<artifactId>scala-library</artifactId>
|
|
<version>${scala.version}</version>
|
|
</dependency>
|
|
|
|
<!-- Hoodie -->
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-client-common</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-spark-client</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-common</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-hadoop-mr</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-hive-sync</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-sync-common</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-spark-common_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.curator</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>${hudi.spark.module}_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<!-- Logging -->
|
|
<dependency>
|
|
<groupId>log4j</groupId>
|
|
<artifactId>log4j</artifactId>
|
|
</dependency>
|
|
|
|
<!-- Fasterxml -->
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-annotations</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.module</groupId>
|
|
<artifactId>jackson-module-scala_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
|
|
<!-- Avro -->
|
|
<dependency>
|
|
<groupId>org.apache.avro</groupId>
|
|
<artifactId>avro</artifactId>
|
|
<exclusions>
|
|
<exclusion>
|
|
<!-- this version to conflict to spark-core_2.12 -->
|
|
<groupId>com.thoughtworks.paranamer</groupId>
|
|
<artifactId>paranamer</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<!-- Parquet -->
|
|
<dependency>
|
|
<groupId>org.apache.parquet</groupId>
|
|
<artifactId>parquet-avro</artifactId>
|
|
</dependency>
|
|
|
|
<!-- Spark -->
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-sql_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-hive_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-sql_${scala.binary.version}</artifactId>
|
|
<classifier>tests</classifier>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<classifier>tests</classifier>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
|
|
<classifier>tests</classifier>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<!-- Spark (Packages) -->
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-avro_${scala.binary.version}</artifactId>
|
|
<scope>provided</scope>
|
|
</dependency>
|
|
|
|
<!-- Hadoop -->
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-client</artifactId>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
<scope>provided</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-common</artifactId>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>javax.servlet.jsp</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
<scope>provided</scope>
|
|
</dependency>
|
|
|
|
<!-- Hive -->
|
|
<dependency>
|
|
<groupId>${hive.groupid}</groupId>
|
|
<artifactId>hive-exec</artifactId>
|
|
<version>${hive.version}</version>
|
|
<classifier>${hive.exec.classifier}</classifier>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.mail</groupId>
|
|
<artifactId>mail</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.eclipse.jetty.aggregate</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>${hive.groupid}</groupId>
|
|
<artifactId>hive-jdbc</artifactId>
|
|
<version>${hive.version}</version>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>javax.servlet.jsp</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>${hive.groupid}</groupId>
|
|
<artifactId>hive-metastore</artifactId>
|
|
<version>${hive.version}</version>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>javax.servlet.jsp</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>${hive.groupid}</groupId>
|
|
<artifactId>hive-common</artifactId>
|
|
<version>${hive.version}</version>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.eclipse.jetty.orbit</groupId>
|
|
<artifactId>javax.servlet</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.curator</groupId>
|
|
<artifactId>curator-framework</artifactId>
|
|
<version>${zk-curator.version}</version>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.curator</groupId>
|
|
<artifactId>curator-client</artifactId>
|
|
<version>${zk-curator.version}</version>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.curator</groupId>
|
|
<artifactId>curator-recipes</artifactId>
|
|
<version>${zk-curator.version}</version>
|
|
</dependency>
|
|
|
|
<!-- Hoodie - Test -->
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-client-common</artifactId>
|
|
<version>${project.version}</version>
|
|
<classifier>tests</classifier>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-spark-client</artifactId>
|
|
<version>${project.version}</version>
|
|
<classifier>tests</classifier>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hudi</groupId>
|
|
<artifactId>hudi-common</artifactId>
|
|
<version>${project.version}</version>
|
|
<classifier>tests</classifier>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.scalatest</groupId>
|
|
<artifactId>scalatest_${scala.binary.version}</artifactId>
|
|
<version>${scalatest.version}</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.jupiter</groupId>
|
|
<artifactId>junit-jupiter-api</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.jupiter</groupId>
|
|
<artifactId>junit-jupiter-engine</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.vintage</groupId>
|
|
<artifactId>junit-vintage-engine</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.jupiter</groupId>
|
|
<artifactId>junit-jupiter-params</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.mockito</groupId>
|
|
<artifactId>mockito-junit-jupiter</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.platform</groupId>
|
|
<artifactId>junit-platform-runner</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.junit.platform</groupId>
|
|
<artifactId>junit-platform-suite-api</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.slf4j</groupId>
|
|
<artifactId>slf4j-api</artifactId>
|
|
<version>${slf4j.version}</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-hdfs</artifactId>
|
|
<classifier>tests</classifier>
|
|
<scope>test</scope>
|
|
<!-- Need these exclusions to make sure JavaSparkContext can be setup. https://issues.apache.org/jira/browse/SPARK-1693 -->
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.mortbay.jetty</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>javax.servlet.jsp</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>javax.servlet</groupId>
|
|
<artifactId>*</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
</dependencies>
|
|
</project>
|