1
0

[HUDI-2267] Update docs and infra test configs, add support for graphite (#3482)

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
This commit is contained in:
Satish M
2021-09-17 19:40:15 +05:30
committed by GitHub
parent 3a150ee181
commit c7a5c8273b
7 changed files with 114 additions and 23 deletions

View File

@@ -177,7 +177,7 @@ cd /opt
Copy the integration tests jar into the docker container
```
docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.8.0-SNAPSHOT.jar adhoc-2:/opt
docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.10.0-SNAPSHOT.jar adhoc-2:/opt
```
```
@@ -214,21 +214,29 @@ spark-submit \
--conf spark.network.timeout=600s \
--conf spark.yarn.max.executor.failures=10 \
--conf spark.sql.catalogImplementation=hive \
--conf spark.driver.extraClassPath=/var/demo/jars/* \
--conf spark.executor.extraClassPath=/var/demo/jars/* \
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
/opt/hudi-integ-test-bundle-0.8.0-SNAPSHOT.jar \
/opt/hudi-integ-test-bundle-0.10.0-SNAPSHOT.jar \
--source-ordering-field test_suite_source_ordering_field \
--use-deltastreamer \
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input \
--target-table table1 \
--props file:/var/hoodie/ws/docker/demo/config/test-suite/test.properties \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider \
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
--input-file-size 125829120 \
--workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml \
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
--table-type COPY_ON_WRITE \
--compact-scheduling-minshare 1
--compact-scheduling-minshare 1 \
--hoodie-conf hoodie.metrics.on=true \
--hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
--hoodie-conf hoodie.metrics.graphite.host=graphite \
--hoodie-conf hoodie.metrics.graphite.port=2003 \
--clean-input \
--clean-output
```
Or a Merge-on-Read job:
@@ -253,23 +261,44 @@ spark-submit \
--conf spark.network.timeout=600s \
--conf spark.yarn.max.executor.failures=10 \
--conf spark.sql.catalogImplementation=hive \
--conf spark.driver.extraClassPath=/var/demo/jars/* \
--conf spark.executor.extraClassPath=/var/demo/jars/* \
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
/opt/hudi-integ-test-bundle-0.8.0-SNAPSHOT.jar \
/opt/hudi-integ-test-bundle-0.10.0-SNAPSHOT.jar \
--source-ordering-field test_suite_source_ordering_field \
--use-deltastreamer \
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input \
--target-table table1 \
--props file:/var/hoodie/ws/docker/demo/config/test-suite/test.properties \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider \
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
--input-file-size 125829120 \
--workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-mor.yaml \
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
--table-type MERGE_ON_READ \
--compact-scheduling-minshare 1
--compact-scheduling-minshare 1 \
--hoodie-conf hoodie.metrics.on=true \
--hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
--hoodie-conf hoodie.metrics.graphite.host=graphite \
--hoodie-conf hoodie.metrics.graphite.port=2003 \
--clean-input \
--clean-output
```
## Visualize and inspect the hoodie metrics and performance (local)
Graphite server is already setup (and up) in ```docker/setup_demo.sh```.
Open browser and access metrics at
```
http://localhost:80
```
Dashboard
```
http://localhost/dashboard
```
## Running long running test suite in Local Docker environment
For long running test suite, validation has to be done differently. Idea is to run same dag in a repeated manner for
@@ -279,12 +308,12 @@ contents both via spark datasource and hive table via spark sql engine. Hive val
If you have "ValidateDatasetNode" in your dag, do not replace hive jars as instructed above. Spark sql engine does not
go well w/ hive2* jars. So, after running docker setup, follow the below steps.
```
docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.8.0-SNAPSHOT.jar adhoc-2:/opt/
docker cp demo/config/test-suite/test.properties adhoc-2:/opt/
docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.10.0-SNAPSHOT.jar adhoc-2:/opt/
docker cp docker/demo/config/test-suite/test.properties adhoc-2:/opt/
```
Also copy your dag of interest to adhoc-2:/opt/
```
docker cp demo/config/test-suite/complex-dag-cow.yaml adhoc-2:/opt/
docker cp docker/demo/config/test-suite/complex-dag-cow.yaml adhoc-2:/opt/
```
For repeated runs, two additional configs need to be set. "dag_rounds" and "dag_intermittent_delay_mins".
@@ -428,7 +457,7 @@ spark-submit \
--conf spark.driver.extraClassPath=/var/demo/jars/* \
--conf spark.executor.extraClassPath=/var/demo/jars/* \
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
/opt/hudi-integ-test-bundle-0.8.0-SNAPSHOT.jar \
/opt/hudi-integ-test-bundle-0.10.0-SNAPSHOT.jar \
--source-ordering-field test_suite_source_ordering_field \
--use-deltastreamer \
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
@@ -446,6 +475,14 @@ spark-submit \
--clean-output
```
If you wish to enable metrics add below properties as well
```
--hoodie-conf hoodie.metrics.on=true \
--hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
--hoodie-conf hoodie.metrics.graphite.host=graphite \
--hoodie-conf hoodie.metrics.graphite.port=2003 \
```
Few ready to use dags are available under docker/demo/config/test-suite/ that could give you an idea for long running
dags.
```