[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (#5501)
- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer.
This commit is contained in:
committed by
GitHub
parent
ecd47e7aae
commit
0cec955fa2
@@ -522,6 +522,78 @@ Spark submit with the flag:
|
||||
--saferSchemaEvolution
|
||||
```
|
||||
|
||||
### Multi-writer tests
|
||||
Integ test framework also supports multi-writer tests.
|
||||
|
||||
#### Multi-writer tests with deltastreamer and a spark data source writer.
|
||||
|
||||
Sample spark-submit command to test one delta streamer and a spark data source writer.
|
||||
```shell
|
||||
./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 \
|
||||
--conf spark.task.cpus=3 --conf spark.executor.cores=3 \
|
||||
--conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4 \
|
||||
--conf spark.rdd.compress=true --conf spark.kryoserializer.buffer.max=2000m \
|
||||
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
|
||||
--conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true \
|
||||
--conf spark.sql.hive.convertMetastoreParquet=false --conf spark.driver.maxResultSize=12g \
|
||||
--conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s \
|
||||
--conf spark.yarn.max.executor.failures=10 \
|
||||
--conf spark.sql.catalogImplementation=hive \
|
||||
--class org.apache.hudi.integ.testsuite.HoodieMultiWriterTestSuiteJob \
|
||||
<HUDI_REPO_DIR>/packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.12.0-SNAPSHOT.jar \
|
||||
--source-ordering-field test_suite_source_ordering_field \
|
||||
--use-deltastreamer \
|
||||
--target-base-path /tmp/hudi/output \
|
||||
--input-base-paths "/tmp/hudi/input1,/tmp/hudi/input2" \
|
||||
--target-table table1 \
|
||||
--props-paths "file:<HUDI_REPO_DIR>/docker/demo/config/test-suite/multi-writer-local-1.properties,file:<HUDI_REPO_DIR>/hudi/docker/demo/config/test-suite/multi-writer-local-2.properties" \
|
||||
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider \
|
||||
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
|
||||
--input-file-size 125829120 \
|
||||
--workload-yaml-paths "file:<HUDI_REPO_DIR>/docker/demo/config/test-suite/multi-writer-1-ds.yaml,file:<HUDI_REPO_DIR>/docker/demo/config/test-suite/multi-writer-2-sds.yaml" \
|
||||
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
|
||||
--table-type COPY_ON_WRITE \
|
||||
--compact-scheduling-minshare 1 \
|
||||
--input-base-path "dummyValue" \
|
||||
--workload-yaml-path "dummyValue" \
|
||||
--props "dummyValue" \
|
||||
--use-hudi-data-to-generate-updates
|
||||
```
|
||||
|
||||
#### Multi-writer tests with 4 concurrent spark data source writer.
|
||||
|
||||
```shell
|
||||
./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 \
|
||||
--conf spark.task.cpus=3 --conf spark.executor.cores=3 \
|
||||
--conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4 \
|
||||
--conf spark.rdd.compress=true --conf spark.kryoserializer.buffer.max=2000m \
|
||||
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
|
||||
--conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true \
|
||||
--conf spark.sql.hive.convertMetastoreParquet=false --conf spark.driver.maxResultSize=12g \
|
||||
--conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s \
|
||||
--conf spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive \
|
||||
--class org.apache.hudi.integ.testsuite.HoodieMultiWriterTestSuiteJob \
|
||||
<BUNDLE_LOCATION>/hudi-integ-test-bundle-0.12.0-SNAPSHOT.jar \
|
||||
--source-ordering-field test_suite_source_ordering_field \
|
||||
--use-deltastreamer \
|
||||
--target-base-path /tmp/hudi/output \
|
||||
--input-base-paths "/tmp/hudi/input1,/tmp/hudi/input2,/tmp/hudi/input3,/tmp/hudi/input4" \
|
||||
--target-table table1 \
|
||||
--props-paths "file:<PROPS_LOCATION>/multi-writer-local-1.properties,file:<PROPS_LOCATION>/multi-writer-local-2.properties,file:<PROPS_LOCATION>/multi-writer-local-3.properties,file:<PROPS_LOCATION>/multi-writer-local-4.properties"
|
||||
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider \
|
||||
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
|
||||
--input-file-size 125829120 \
|
||||
--workload-yaml-paths "file:<PROPS_LOCATION>/multi-writer-1-sds.yaml,file:<PROPS_LOCATION>/multi-writer-2-sds.yaml,file:<PROPS_LOCATION>/multi-writer-3-sds.yaml,file:<PROPS_LOCATION>/multi-writer-4-sds.yaml" \
|
||||
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
|
||||
--table-type COPY_ON_WRITE \
|
||||
--compact-scheduling-minshare 1 \
|
||||
--input-base-path "dummyValue" \
|
||||
--workload-yaml-path "dummyValue" \
|
||||
--props "dummyValue" \
|
||||
--use-hudi-data-to-generate-updates
|
||||
```
|
||||
|
||||
|
||||
## Automated tests for N no of yamls in Local Docker environment
|
||||
|
||||
Hudi provides a script to assist you in testing N no of yamls automatically. Checkout the script under
|
||||
|
||||
Reference in New Issue
Block a user