# Docker Demo for Hudi This repo contains the docker demo resources for building docker demo images, set up the demo, and running Hudi in the docker demo environment. ## Repo Organization ### Configs for assembling docker images - `/hoodie` The `/hoodie` folder contains all the configs for assembling necessary docker images. The name and repository of each docker image, e.g., `apachehudi/hudi-hadoop_2.8.4-trinobase_368`, is defined in the maven configuration file `pom.xml`. ### Docker compose config for the Demo - `/compose` The `/compose` folder contains the yaml file to compose the Docker environment for running Hudi Demo. ### Resources and Sample Data for the Demo - `/demo` The `/demo` folder contains useful resources and sample data use for the Demo. ## Build and Test Image locally To build all docker images locally, you can run the script: ```shell ./build_local_docker_images.sh ``` To build a single image target, you can run ```shell mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl : -am # For example, to build hudi-hadoop-trinobase-docker mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :hudi-hadoop-trinobase-docker -am ``` Alternatively, you can use `docker` cli directly under `hoodie/hadoop`. Note that, you need to manually name your local image by using `-t` option to match the naming in the `pom.xml`, so that you can update the corresponding image repository in Docker Hub (detailed steps in the next section). ```shell # Run under hoodie/hadoop, the is optional, "latest" by default docker build -t /[:] # For example, to build trinobase docker build trinobase -t apachehudi/hudi-hadoop_2.8.4-trinobase_368 ``` After new images are built, you can run the following script to bring up docker demo with your local images: ```shell ./setup_demo.sh dev ``` ## Upload Updated Image to Repository on Docker Hub Once you have built the updated image locally, you can push the corresponding this repository of the image to the Docker Hud registry designated by its name or tag: ```shell docker push /: # For example docker push apachehudi/hudi-hadoop_2.8.4-trinobase_368 ``` You can also easily push the image to the Docker Hub using Docker Desktop app: go to `Images`, search for the image by the name, and then click on the three dots and `Push to Hub`. ![Push to Docker Hub](push_to_docker_hub.png) Note that you need to ask for permission to upload the Hudi Docker Demo images to the repositories. You can find more information on [Docker Hub Repositories Manual](https://docs.docker.com/docker-hub/repos/). ## Docker Demo Setup Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo). ## Building Multi-Arch Images NOTE: The steps below require some code changes. Support for multi-arch builds in a fully automated manner is being tracked by [HUDI-3601](https://issues.apache.org/jira/browse/HUDI-3601). By default, the docker images are built for x86_64 (amd64) architecture. Docker `buildx` allows you to build multi-arch images, link them together with a manifest file, and push them all to a registry – with a single command. Let's say we want to build for arm64 architecture. First we need to ensure that `buildx` setup is done locally. Please follow the below steps (referred from https://www.docker.com/blog/multi-arch-images): ``` # List builders ~ ❯❯❯ docker buildx ls NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS default * docker default default running linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6 # If you are using the default builder, which is basically the old builder, then do following ~ ❯❯❯ docker buildx create --name mybuilder mybuilder ~ ❯❯❯ docker buildx use mybuilder ~ ❯❯❯ docker buildx inspect --bootstrap [+] Building 2.5s (1/1) FINISHED => [internal] booting buildkit 2.5s => => pulling image moby/buildkit:master 1.3s => => creating container buildx_buildkit_mybuilder0 1.2s Name: mybuilder Driver: docker-container Nodes: Name: mybuilder0 Endpoint: unix:///var/run/docker.sock Status: running Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6 ``` Now goto `/docker/hoodie/hadoop` and change the `Dockerfile` to pull dependent images corresponding to arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile) (which pulls jdk8 image), change the line `FROM openjdk:8u212-jdk-slim-stretch` to `FROM arm64v8/openjdk:8u212-jdk-slim-stretch`. Then, from under `/docker/hoodie/hadoop` directory, execute the following command to build as well as push the image to the dockerhub repo: ``` # Run under hoodie/hadoop, the is optional, "latest" by default docker buildx build --platform -t /[:] --push # For example, to build base image docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push ``` Once the base image is pushed then you could do something similar for other images. Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the base image with tag corresponding to linux/arm64 platform. ``` # Change below line in the Dockerfile FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest # as shown below FROM --platform=linux/arm64 apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1 # and then build & push from under hoodie/hadoop dir docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push ``` Similarly, for images that are dependent on hive (e.g. [base spark](./hoodie/hadoop/spark_base/Dockerfile) , [sparkmaster](./hoodie/hadoop/sparkmaster/Dockerfile), [sparkworker](./hoodie/hadoop/sparkworker/Dockerfile) and [sparkadhoc](./hoodie/hadoop/sparkadhoc/Dockerfile)), change the corresponding Dockerfile to pull the base hive image with tag corresponding to arm64. Then build and push using `docker buildx` command. For the sake of completeness, here is a [patch](https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2) which shows what changes to make in Dockerfiles (assuming tag is named `linux-arm64-0.10.1`), and below is the list of `docker buildx` commands. ``` docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push ``` Once all the required images are pushed to the dockerhub repos, then we need to do one additional change in [docker compose](./compose/docker-compose_hadoop284_hive233_spark244.yml) file. Apply [this patch](https://gist.github.com/codope/3dd986de5e54f0650dd74b6032e4456c) to the docker compose file so that [setup_demo](./setup_demo.sh) pulls images with the correct tag for arm64. And now we should be ready to run the setup script and follow the docker demo.