* [HUDI-3389] fix ColumnarArrayData ClassCastException issue
* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
Fix dependency conflict
Fix repairs command
Implement putIfAbsent for DDB lock provider
Add upgrade step and validate while fetching configs
Validate checksum for latest table version only while fetching config
Move generateChecksum to BinaryUtil
Rebase and resolve conflict
Fix table version check
Currently, HadoopFsRelation will use the value of the real partition path as the value of the partition field. However, different from the normal table, Hudi will persist the partition value in the parquet file. And in some cases, it's different between the value of the real partition path and the value of the partition field.
So here we implement BaseFileOnlyViewRelation which lets Hudi manage its own relation.
- This adds a restore plan and serializes it to restore.requested meta file in timeline. This also means that we are introducing schedule and execution phases for restore which was not present before.
- This adds support in spark-datasource to just schedule table services inline so that users can leverage async execution w/o the need for lock service providers.
Rebased Parquet-based FileInputFormat impls to inherit from MapredParquetInputFormat, to make sure that Hive is appropriately recognizing those impls and applying corresponding optimizations.
- Converted HoodieRealtimeFileInputFormatBase and HoodieFileInputFormatBase into standalone implementations that could be instantiated as standalone objects (which could be used for delegation)
- Renamed HoodieFileInputFormatBase > HoodieCopyOnWriteTableInputFormat, HoodieRealtimeFileInputFormatBase > HoodieMergeOnReadTableInputFormat
- Scaffolded HoodieParquetFileInputFormatBase for all Parquet impls to inherit from
- Rebased Parquet impls onto HoodieParquetFileInputFormatBase
* [HUDI-3091] Making SIMPLE index as the default index type
* Fixing tests
* Traiging timeouts
* disable SIMPLE index for bootstrap tests
* removing test run start and end log statements
* Fixing simple index parallellism for some tests
* Disabling failing test for now
* reverting previous disable
* Reverting all changes
* fixing azure pipeline script
Unify Hive's MOR implementations to avoid duplication to avoid duplication across implementations for different file-formats (Parquet, HFile, etc)
- Extracted HoodieRealtimeFileInputFormatBase (extending COW HoodieFileInputFormatBase base)
- Rebased Parquet, HFile implementations onto HoodieRealtimeFileInputFormatBase
- Tidying up