* [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files
* [HUDI-892] for test
* [HUDI-892] fix bug generate array from split
* [HUDI-892] revert test log
1. Added the --clean-input and --clean-output parameters to clean the input and output directories before starting the job
2. Added the --delete-old-input parameter to deleted older batches for data already ingested. This helps keep number of redundant files low.
3. Added the --input-parallelism parameter to restrict the parallelism when generating input data. This helps keeping the number of generated input files low.
4. Added an option start_offset to Dag Nodes. Without ability to specify start offsets, data is generated into existing partitions. With start offset, DAG can control on which partition, the data is to be written.
5. Fixed generation of records for correct number of partitions
- In the existing implementation, the partition is chosen as a random long. This does not guarantee exact number of requested partitions to be created.
6. Changed variable blacklistedFields to be a Set as that is faster than List for membership checks.
7. Fixed integer division for Math.ceil. If two integers are divided, the result is not double unless one of the integer is casted to double.
* [HUDI-1326] Added an API to force publish metrics and flush them.
Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite.
* Code cleanups
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
- config field is no longer transient in key generator
- verified that the key generator object is shipped from the driver to executors, just the one time and reused for each record
The current DFSPathSelector only ignore prefix(_, .) at the file level while files under subdirectories
e.g. (.checkpoint/*) are still considered which result in bad-format exception during reading.
- Update hudi-spark-bundle pom to not relocate hbase and htrace pattern
- Remove codec relocation as this is not included in bundle which was causing error
Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`
Migrated usages in `TestCleaner#testPendingCompactions`.
Also improved some API names in `HoodieTestTable`.
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column
Co-authored-by: huangjing <huangjing@clinbrain.com>