1
0
Files
hudi/style/checkstyle.xml
Alexey Kudinkin 5755ff25a4 [HUDI-2814] Addressing issues w/ Z-order Layout Optimization (#4060)
* `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`;
Moved Z-index helper under `hudi.index.zorder` package

* Tidying up `ZOrderingIndexHelper`

* Fixing compilation

* Fixed index new/original table merging sequence to always prefer values from new index;
Cleaned up `HoodieSparkUtils`

* Added test for `mergeIndexSql`

* Abstracted Z-index name composition w/in `ZOrderingIndexHelper`;

* Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference

* Properly handle exceptions origination during pruning in `HoodieFileIndex`

* Make sure no errors are logged upon encountering `AnalysisException`

* Cleaned up Z-index updating sequence;
Tidying up comments, java-docs;

* Fixed Z-index to properly handle changes of the list of clustered columns

* Tidying up

* `lint`

* Suppressing `JavaDocStyle` first sentence check

* Fixed compilation

* Fixing incorrect `DecimalType` conversion

* Refactored test `TestTableLayoutOptimization`
  - Added Z-index table composition test (against fixtures)
  - Separated out GC test;
Tidying up

* Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON

* Scaffolded `DataTypeUtils` to do basic checks of Spark types;
Added proper compatibility checking b/w old/new index-tables

* Added test for Z-index tables merging

* Fixed import being shaded by creating internal `hudi.util` package

* Fixed packaging for `TestOptimizeTable`

* Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema

* Make sure existing Z-index table schema is sync'd to source table's one

* Fixed shaded refs

* Fixed tests

* Fixed type conversion of Parquet provided metadata values into Spark expected schemas

* Fixed `composeIndexSchema` utility to propose proper schema

* Added more tests for Z-index:
  - Checking that Z-index table is built correctly
  - Checking that Z-index tables are merged correctly (during update)

* Fixing source table

* Fixing tests to read from Parquet w/ proper schema

* Refactored `ParquetUtils` utility reading stats from Parquet footers

* Fixed incorrect handling of Decimals extracted from Parquet footers

* Worked around issues in javac failign to compile stream's collection

* Fixed handling of `Date` type

* Fixed handling of `DateType` to be parsed as `LocalDate`

* Updated fixture;
Make sure test loads Z-index fixture using proper schema

* Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well)

* Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe

* Tidying up

* Make sure schema is used upon reading to validate input files are in the appropriate format;
Tidying up;

* Worked around javac (1.8) inability to infer expression type properly

* Updated fixtures;
Tidying up

* Fixing compilation after rebase

* Assert clustering have in Z-order layout optimization testing

* Tidying up exception messages

* XXX

* Added test validating Z-index lookup filter correctness

* Added more test-cases;
Tidying up

* Added tests for string expressions

* Fixed incorrect Z-index filter lookup translations

* Added more test-cases

* Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling

* Added `-target:jvm-1.8` for `hudi-spark` module

* Adding more tests

* Added tests for non-indexed columns

* Properly handle non-indexed columns by falling back to a re-write of containing expression as  `TrueLiteral` instead

* Fixed tests

* Removing the parquet test files and disabling corresponding tests

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-11-26 10:02:15 -08:00

316 lines
15 KiB
XML

<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE module PUBLIC
"-//Puppy Crawl//DTD Check Configuration 1.3//EN"
"http://checkstyle.sourceforge.net/dtds/configuration_1_3.dtd">
<!--
Checkstyle configuration that checks the Google coding conventions from Google Java Style
that can be found at https://google.github.io/styleguide/javaguide.html.
Checkstyle is very configurable. Be sure to read the documentation at
https://checkstyle.sourceforge.io/checks.html.
To completely disable a check, just comment it out or delete it from the file.
Authors: Max Vetrenko, Ruslan Diachenko, Roman Ivanov.
-->
<module name = "Checker">
<property name="charset" value="UTF-8"/>
<property name="severity" value="error"/>
<property name="fileExtensions" value="java, properties, xml"/>
<!-- Checks for whitespace -->
<!-- See http://checkstyle.sf.net/config_whitespace.html -->
<module name="FileTabCharacter">
<property name="eachLine" value="true"/>
</module>
<module name="SuppressionFilter">
<property name="file" value="${checkstyle.suppressions.file}" default="checkstyle-suppressions.xml"/>
<property name="optional" value="true"/>
</module>
<module name="SuppressWarningsFilter" />
<module name="TreeWalker">
<module name="AvoidNestedBlocks">
<property name="allowInSwitchCase" value="true"/>
</module>
<module name="SuppressWarningsHolder"/>
<module name="OuterTypeFilename"/>
<module name="IllegalTokenText">
<property name="tokens" value="STRING_LITERAL, CHAR_LITERAL"/>
<property name="format" value="\\u00(09|0(a|A)|0(c|C)|0(d|D)|22|27|5(C|c))|\\(0(10|11|12|14|15|42|47)|134)"/>
<property name="message" value="Consider using special escape sequence instead of octal value or Unicode escaped value."/>
</module>
<module name="AvoidEscapedUnicodeCharacters">
<property name="allowEscapesForControlCharacters" value="true"/>
<property name="allowByTailComment" value="true"/>
<property name="allowNonPrintableEscapes" value="true"/>
</module>
<module name="LineLength">
<property name="max" value="200"/>
<property name="ignorePattern" value="^ *\* *[^ ]+$"/>
</module>
<module name="OneTopLevelClass"/>
<module name="NoLineWrap"/>
<module name="EmptyBlock">
<property name="option" value="TEXT"/>
<property name="tokens" value="LITERAL_TRY, LITERAL_FINALLY, LITERAL_IF, LITERAL_ELSE, LITERAL_SWITCH"/>
</module>
<module name="NeedBraces"/>
<module name="LeftCurly"/>
<module name="RightCurly">
<property name="id" value="RightCurlySame"/>
<property name="tokens" value="LITERAL_TRY, LITERAL_CATCH, LITERAL_FINALLY, LITERAL_IF, LITERAL_ELSE, LITERAL_DO"/>
</module>
<module name="RightCurly">
<property name="id" value="RightCurlyAlone"/>
<property name="option" value="alone"/>
<property name="tokens" value="CLASS_DEF, METHOD_DEF, CTOR_DEF, LITERAL_FOR, LITERAL_WHILE, STATIC_INIT, INSTANCE_INIT"/>
</module>
<module name="WhitespaceAround">
<property name="allowEmptyConstructors" value="true"/>
<property name="allowEmptyMethods" value="true"/>
<property name="allowEmptyTypes" value="true"/>
<property name="allowEmptyLoops" value="true"/>
<message key="ws.notFollowed"
value="WhitespaceAround: ''{0}'' is not followed by whitespace. Empty blocks may only be represented as '{}' when not part of a multi-block statement (4.1.3)"/>
<message key="ws.notPreceded"
value="WhitespaceAround: ''{0}'' is not preceded with whitespace."/>
</module>
<module name="OneStatementPerLine"/>
<module name="MultipleVariableDeclarations"/>
<module name="ArrayTypeStyle"/>
<module name="MissingSwitchDefault"/>
<module name="FallThrough"/>
<module name="UpperEll"/>
<module name="ModifierOrder"/>
<module name="EmptyLineSeparator">
<property name="allowNoEmptyLineBetweenFields" value="true"/>
<property name="allowMultipleEmptyLines" value="false"/>
<property name="tokens" value="PACKAGE_DEF, IMPORT, CLASS_DEF, INTERFACE_DEF, ENUM_DEF,
STATIC_INIT, INSTANCE_INIT, METHOD_DEF, CTOR_DEF"/>
</module>
<module name="SeparatorWrap">
<property name="id" value="SeparatorWrapDot"/>
<property name="tokens" value="DOT"/>
<property name="option" value="nl"/>
</module>
<module name="SeparatorWrap">
<property name="id" value="SeparatorWrapComma"/>
<property name="tokens" value="COMMA"/>
<property name="option" value="EOL"/>
</module>
<module name="SeparatorWrap">
<!-- ELLIPSIS is EOL until https://github.com/google/styleguide/issues/258 -->
<property name="id" value="SeparatorWrapEllipsis"/>
<property name="tokens" value="ELLIPSIS"/>
<property name="option" value="EOL"/>
</module>
<module name="SeparatorWrap">
<!-- ARRAY_DECLARATOR is EOL until https://github.com/google/styleguide/issues/259 -->
<property name="id" value="SeparatorWrapArrayDeclarator"/>
<property name="tokens" value="ARRAY_DECLARATOR"/>
<property name="option" value="EOL"/>
</module>
<module name="SeparatorWrap">
<property name="id" value="SeparatorWrapMethodRef"/>
<property name="tokens" value="METHOD_REF"/>
<property name="option" value="nl"/>
</module>
<module name="PackageName">
<property name="format" value="^[a-z]+(\.[a-z][a-z0-9]*)*$"/>
<message key="name.invalidPattern"
value="Package name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="TypeName">
<message key="name.invalidPattern"
value="Type name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="MemberName">
<property name="format" value="^[a-z][a-zA-Z0-9]*$"/>
<message key="name.invalidPattern"
value="Member name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="ParameterName">
<property name="format" value="^[a-z]([a-zA-Z0-9]*)?$"/>
<message key="name.invalidPattern"
value="Parameter name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="CatchParameterName">
<property name="format" value="^[a-z]([a-zA-Z0-9]*)?$"/>
<message key="name.invalidPattern"
value="Catch parameter name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="LocalVariableName">
<property name="tokens" value="VARIABLE_DEF"/>
<property name="format" value="^[a-z]([a-zA-Z0-9]*)?$"/>
<message key="name.invalidPattern"
value="Local variable name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="ClassTypeParameterName">
<property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
<message key="name.invalidPattern"
value="Class type name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="MethodTypeParameterName">
<property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
<message key="name.invalidPattern"
value="Method type name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="InterfaceTypeParameterName">
<property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
<message key="name.invalidPattern"
value="Interface type name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="NoFinalizer"/>
<module name="GenericWhitespace">
<message key="ws.followed"
value="GenericWhitespace ''{0}'' is followed by whitespace."/>
<message key="ws.preceded"
value="GenericWhitespace ''{0}'' is preceded with whitespace."/>
<message key="ws.illegalFollow"
value="GenericWhitespace ''{0}'' should followed by whitespace."/>
<message key="ws.notPreceded"
value="GenericWhitespace ''{0}'' is not preceded with whitespace."/>
</module>
<module name="Indentation">
<property name="basicOffset" value="2"/>
<property name="braceAdjustment" value="0"/>
<property name="caseIndent" value="2"/>
<property name="throwsIndent" value="4"/>
<property name="lineWrappingIndentation" value="4"/>
<property name="arrayInitIndent" value="2"/>
</module>
<!--
<module name="AbbreviationAsWordInName">
<property name="ignoreFinal" value="false"/>
<property name="allowedAbbreviationLength" value="4"/>
</module>
<module name="OverloadMethodsDeclarationOrder"/>
<module name="VariableDeclarationUsageDistance">
<property name="allowedDistance" value="5"/>
</module>
-->
<module name="UnusedImports"/>
<module name="RedundantImport"/>
<module name="MethodParamPad"/>
<module name="NoWhitespaceBefore">
<property name="tokens" value="COMMA, SEMI, POST_INC, POST_DEC, DOT, ELLIPSIS, METHOD_REF"/>
<property name="allowLineBreaks" value="true"/>
</module>
<module name="ParenPad"/>
<module name="OperatorWrap">
<property name="option" value="NL"/>
<property name="tokens" value="BAND, BOR, BSR, BXOR, DIV, EQUAL, GE, GT, LAND, LE, LITERAL_INSTANCEOF, LOR, LT, MINUS, MOD, NOT_EQUAL, PLUS, QUESTION, SL, SR, STAR, METHOD_REF "/>
</module>
<module name="AnnotationLocation">
<property name="id" value="AnnotationLocationMostCases"/>
<property name="tokens" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF"/>
</module>
<module name="AnnotationLocation">
<property name="id" value="AnnotationLocationVariables"/>
<property name="tokens" value="VARIABLE_DEF"/>
<property name="allowSamelineMultipleAnnotations" value="true"/>
</module>
<!--
<module name="NonEmptyAtclauseDescription"/>
<module name="JavadocTagContinuationIndentation"/>
<module name="SummaryJavadoc">
<property name="forbiddenSummaryFragments" value="^@return the *|^This method returns |^A [{]@code [a-zA-Z0-9]+[}]( is a )"/>
</module>
<module name="JavadocParagraph"/>
<module name="AtclauseOrder">
<property name="tagOrder" value="@param, @return, @throws, @deprecated"/>
<property name="target" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF, VARIABLE_DEF"/>
</module>
<module name="JavadocMethod">
<property name="scope" value="public"/>
<property name="allowMissingParamTags" value="true"/>
<property name="allowMissingThrowsTags" value="true"/>
<property name="allowMissingReturnTag" value="true"/>
<property name="minLineCount" value="2"/>
<property name="allowedAnnotations" value="Override, Test"/>
<property name="allowThrowsTagsForSubclasses" value="true"/>
</module>
-->
<module name="MethodName">
<property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9_]*$"/>
<message key="name.invalidPattern"
value="Method name ''{0}'' must match pattern ''{1}''."/>
</module>
<!--
<module name="SingleLineJavadoc">
<property name="ignoreInlineTags" value="false"/>
</module>
-->
<module name="EmptyCatchBlock">
<property name="exceptionVariableName" value="expected"/>
</module>
<module name="CommentsIndentation"/>
<module name="IllegalImport">
<property name="regexp" value="true"/>
<property name="illegalPkgs" value="org\.apache\.commons, com\.google\.common"/>
<property name="illegalClasses"
value="^java\.util\.Optional, ^org\.junit\.(?!jupiter|platform|contrib|Rule|runner)(.*)"/>
</module>
<module name="EmptyStatement" />
<!-- Checks for Java Docs. -->
<module name="JavadocStyle">
<property name="checkFirstSentence" value="false"/>
</module>
<module name="JavadocType">
<property name="severity" value="info"/>
<property name="scope" value="protected"/>
<property name="allowMissingParamTags" value="true"/>
</module>
<!-- Checks for out of order import statements. -->
<module name="ImportOrder">
<property name="severity" value="info"/>
<property name="groups" value="org.apache.hudi,*,javax,java,scala"/>
<property name="separated" value="true"/>
<property name="sortStaticImportsAlphabetically" value="true"/>
<property name="option" value="bottom"/>
<property name="tokens" value="STATIC_IMPORT, IMPORT"/>
<message key="import.ordering"
value="Import {0} appears after other imports that it should precede"/>
</module>
<!-- Checks for star import. -->
<module name="AvoidStarImport" />
<!-- Checks for constant name. -->
<module name="ConstantName" />
<!-- Checks for simple boolean expressions. -->
<module name="SimplifyBooleanExpression"/>
<module name="ImportControl">
<property name="file" value="${basedir}/style/import-control.xml"/>
<property name="path" value="^.*[\\/]hudi-client[\\/]hudi-client-common[\\/]src[\\/].*$"/>
</module>
</module>
</module>