[HUDI-4084] Add support to test async table services with integ test suite framework (#5557)
* Add support to test async table services with integ test suite framework * Make await time for validation configurable
This commit is contained in:
committed by
GitHub
parent
47b764ec33
commit
af1128acf9
@@ -593,6 +593,56 @@ Sample spark-submit command to test one delta streamer and a spark data source w
|
||||
--use-hudi-data-to-generate-updates
|
||||
```
|
||||
|
||||
=======
|
||||
### Testing async table services
|
||||
We can test async table services with deltastreamer using below command. 3 additional arguments are required to test async
|
||||
table services comapared to previous command.
|
||||
|
||||
```shell
|
||||
--continuous \
|
||||
--test-continuous-mode \
|
||||
--min-sync-interval-seconds 20
|
||||
```
|
||||
|
||||
Here is the full command:
|
||||
```shell
|
||||
./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4 \
|
||||
--conf spark.task.cpus=1 --conf spark.executor.cores=1 \
|
||||
--conf spark.task.maxFailures=100 \
|
||||
--conf spark.memory.fraction=0.4 \
|
||||
--conf spark.rdd.compress=true \
|
||||
--conf spark.kryoserializer.buffer.max=2000m \
|
||||
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
|
||||
--conf spark.memory.storageFraction=0.1 \
|
||||
--conf spark.shuffle.service.enabled=true \
|
||||
--conf spark.sql.hive.convertMetastoreParquet=false \
|
||||
--conf spark.driver.maxResultSize=12g \
|
||||
--conf spark.executor.heartbeatInterval=120s \
|
||||
--conf spark.network.timeout=600s \
|
||||
--conf spark.yarn.max.executor.failures=10 \
|
||||
--conf spark.sql.catalogImplementation=hive \
|
||||
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob <PATH_TO_BUNDLE>/hudi-integ-test-bundle-0.12.0-SNAPSHOT.jar \
|
||||
--source-ordering-field test_suite_source_ordering_field \
|
||||
--use-deltastreamer \
|
||||
--target-base-path /tmp/hudi/output \
|
||||
--input-base-path /tmp/hudi/input \
|
||||
--target-table table1 \
|
||||
-props file:/tmp/test.properties \
|
||||
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider \
|
||||
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
|
||||
--input-file-size 125829120 \
|
||||
--workload-yaml-path file:/tmp/simple-deltastreamer.yaml \
|
||||
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
|
||||
--table-type COPY_ON_WRITE \
|
||||
--compact-scheduling-minshare 1 \
|
||||
--clean-input \
|
||||
--clean-output \
|
||||
--continuous \
|
||||
--test-continuous-mode \
|
||||
--min-sync-interval-seconds 20
|
||||
```
|
||||
|
||||
We can use any yaml and properties file w/ above spark-submit command to test deltastreamer w/ async table services.
|
||||
|
||||
## Automated tests for N no of yamls in Local Docker environment
|
||||
|
||||
|
||||
Reference in New Issue
Block a user