Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`
Migrated usages in `TestCleaner#testPendingCompactions`.
Also improved some API names in `HoodieTestTable`.
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column
Co-authored-by: huangjing <huangjing@clinbrain.com>
1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs.
2. Fix CleanNode constructor which is not correctly implemented
3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts.
4. Fixed generation of records of specific size
- The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size.
- In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map.
5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp).
6. Making --workload-generator-classname an optional parameter as most probably the default will be used
This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs.
- Added an API to shutdown the metrics reporter after tests.
- This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules
- Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc
- Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common`
- Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies
- To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Migrate deprecated APIs in HoodieTestUtils to HoodieTestTable for test classes
- TestClientRollback
- TestCopyOnWriteRollbackActionExecutor
Use FileCreateUtils APIs in CompactionTestUtils.
Then remove unused deprecated APIs after migration.
* [HUDI-995] Use HoodieTestTable in more classes
Migrate test data prep logic in
- TestStatsCommand
- TestHoodieROTablePathFilter
Re-implement methods for create new commit times in HoodieTestUtils and HoodieClientTestHarness
- Move relevant APIs to HoodieTestTable
- Migrate usages
After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils
Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage
## Brief change log
update current value for several fields that you want to change.
The default payload OverwriteWithLatestAvroPayload overwrite the whole record when
compared to `orderingVal`.This doesn't meet our need when we just want to change specified fields.
For example: (suppose Default value is null)
```
current Value
Field: name age gender
Value: karl 20 male
```
```
insert Value
Field: name age gender
Value: null 30 null
```
```
After insert:
Field: name age gender
Value: karl 30 male
```
## Verify this pull request
Added TestOverwriteNonDefaultsWithLatestAvroPayload to verify the change.