1
0
Files
hudi/hudi-kafka-connect
rmahindra123 e528dd798a [HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592)
- Fixing packaging, naming of classes
 - Use of log4j over slf4j for uniformity
- More follow-on fixes
 - Added a version to control/coordinator events.
 - Eliminated the config added to write config
 - Fixed fetching of checkpoints based on table type
 - Clean up of naming, code placement

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-09-10 18:20:26 -07:00
..

Quick Start guide for Kafka Connect Sink for Hudi

This repo contains a sample project that can be used to start off your own source connector for Kafka Connect.

Building the connector

The first thing you need to do to start using this connector is building it. In order to do that, you need to install the following dependencies:

After installing these dependencies, execute the following command:

cd $HUDI_DIR
mvn clean package

Incremental Builds

mvn clean -pl hudi-kafka-connect install -DskipTests
mvn clean -pl packaging/hudi-kafka-connect-bundle install

Put hudi connector in Kafka Connect classpath

cp $HUDI_DIR/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-0.10.0-SNAPSHOT.jar /usr/local/share/java/hudi-kafka-connect/

Trying the connector

After building the package, we need to install the Apache Kafka

1 - Starting the environment

Start the ZK and Kafka:

./bin/zookeeper-server-start.sh ./config/zookeeper.properties
./bin/kafka-server-start.sh ./config/server.properties

Wait until the kafka cluster is up and running.

2 - Create the Hudi Control Topic for Coordination of the transactions

The control topic should only have 1 partition

./bin/kafka-topics.sh --delete --topic hudi-control-topic --bootstrap-server localhost:9092
./bin/kafka-topics.sh --create --topic hudi-control-topic --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092

3 - Create the Hudi Topic for the Sink and insert data into the topic

Open a terminal to execute the following command:

bash runKafkaTrafficGenerator.sh <total_messages>

4 - Run the Sink connector worker (multiple workers can be run)

Open a terminal to execute the following command:

./bin/connect-distributed.sh ../hudi-kafka-connect/configs/connect-distributed.properties

5- To add the Hudi Sink to the Connector (delete it if you want to re-configure)

curl -X DELETE http://localhost:8083/connectors/hudi-sink
curl -X POST -H "Content-Type:application/json" -d @$HUDI-DIR/hudi-kafka-connect/configs/config-sink.json http://localhost:8083/connectors