1
0

[HUDI-2788] Fixing issues w/ Z-order Layout Optimization (#4026)

* Simplyfying, tidying up

* Fixed packaging for `TestOptimizeTable`

* Cleaned up `HoodiFileIndex` file filtering seq;
Removed optimization manually reading Parquet table circumventing Spark

* Refactored `DataSkippingUtils`:
  - Fixed checks to validate all statistics cols are present
  - Fixed some predicates being constructed incorrectly
  - Rewrote comments for easier comprehension, added more notes
  - Tidying up

* Tidying up tests

* `lint`

* Fixing compilation

* `TestOptimizeTable` > `TestTableLayoutOptimization`;
Added assertions to test data skipping paths

* Fixed tests to properly hit data-skipping path

* Fixed pruned files candidates lookup seq to conservatively included all non-indexed files

* Added java-doc

* Fixed compilation
This commit is contained in:
Alexey Kudinkin
2021-11-24 10:10:28 -08:00
committed by GitHub
parent 973f78f5ca
commit 60b23b9797
6 changed files with 289 additions and 181 deletions

View File

@@ -298,8 +298,8 @@ object HoodieSparkUtils extends SparkAdapterSupport {
*/
def createMergeSql(leftTable: String, rightTable: String, cols: Seq[String]): String = {
var selectsql = ""
for (i <- (0 to cols.size-1)) {
selectsql = selectsql + s" if (${leftTable}.${cols(0)} is null, ${rightTable}.${cols(i)}, ${leftTable}.${cols(i)}) as ${cols(i)} ,"
for (i <- cols.indices) {
selectsql = selectsql + s" if (${leftTable}.${cols(i)} is null, ${rightTable}.${cols(i)}, ${leftTable}.${cols(i)}) as ${cols(i)} ,"
}
"select " + selectsql.dropRight(1) + s" from ${leftTable} full join ${rightTable} on ${leftTable}.${cols(0)} = ${rightTable}.${cols(0)}"
}