[MINOR] Add more configuration to Kafka setup script (#3992)

* [MINOR] Add more configuration to Kafka setup script * Add option to reuse Kafka topic * Minor fixes to README
2021-11-22 18:03:38 -08:00
parent e22150fe15
commit 6aa710eae0
2 changed files with 70 additions and 34 deletions
--- a/hudi-kafka-connect/README.md
+++ b/hudi-kafka-connect/README.md
@@ -61,7 +61,7 @@ Once downloaded and built, run the Zookeeper server and Kafka server using the c

 ```bash
 export KAFKA_HOME=/path/to/kafka_install_dir
-cd $KAFKA_KAFKA_HOME
+cd $KAFKA_HOME
 ./bin/zookeeper-server-start.sh ./config/zookeeper.properties
 ./bin/kafka-server-start.sh ./config/server.properties
 ```
@@ -71,8 +71,9 @@ Wait until the kafka cluster is up and running.
 ### 2 - Set up the schema registry

 Hudi leverages schema registry to obtain the latest schema when writing records. While it supports most popular schema
-registries, we use Confluent schema registry. Download the latest confluent platform and run the schema registry
-service.
+registries, we use Confluent schema registry. Download the
+latest [confluent platform](https://docs.confluent.io/platform/current/installation/index.html) and run the schema
+registry service.

 ```bash
 cd $CONFLUENT_DIR
@@ -98,6 +99,13 @@ cd $HUDI_DIR/hudi-kafka-connect/demo/
 bash setupKafka.sh -n <total_kafka_messages>
 ```

+To generate data for long-running tests, you can add `-b` option to specify the number of batches of data
+to generate, with each batch containing a number of messages and idle time between batches, as follows:
+
+```bash
+bash setupKafka.sh -n <num_kafka_messages_per_batch> -b <num_batches>
+```
+
 ### 4 - Run the Sink connector worker (multiple workers can be run)

 The Kafka connect is a distributed platform, with the ability to run one or more workers (each running multiple tasks)