1
0

[HUDI-3602][DOCS] Update docker README to build multi-arch images using buildx (#5011)

This commit is contained in:
Sagar Sumit
2022-03-10 16:08:27 +05:30
committed by GitHub
parent ec24407191
commit 4e09545be4

View File

@@ -91,3 +91,97 @@ You can find more information on [Docker Hub Repositories Manual](https://docs.d
## Docker Demo Setup
Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo).
## Building Multi-Arch Images
NOTE: The steps below require some code changes. Support for multi-arch builds in a fully automated manner is being
tracked by [HUDI-3601](https://issues.apache.org/jira/browse/HUDI-3601).
By default, the docker images are built for x86_64 (amd64) architecture. Docker `buildx` allows you to build multi-arch
images, link them together with a manifest file, and push them all to a registry with a single command. Let's say we
want to build for arm64 architecture. First we need to ensure that `buildx` setup is done locally. Please follow the
below steps (referred from https://www.docker.com/blog/multi-arch-images):
```
# List builders
~ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS
default * docker
default default running linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
# If you are using the default builder, which is basically the old builder, then do following
~ docker buildx create --name mybuilder
mybuilder
~ docker buildx use mybuilder
~ docker buildx inspect --bootstrap
[+] Building 2.5s (1/1) FINISHED
=> [internal] booting buildkit 2.5s
=> => pulling image moby/buildkit:master 1.3s
=> => creating container buildx_buildkit_mybuilder0 1.2s
Name: mybuilder
Driver: docker-container
Nodes:
Name: mybuilder0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
```
Now goto `<HUDI_REPO_DIR>/docker/hoodie/hadoop` and change the `Dockerfile` to pull dependent images corresponding to
arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile) (which pulls jdk8 image), change the
line `FROM openjdk:8u212-jdk-slim-stretch` to `FROM arm64v8/openjdk:8u212-jdk-slim-stretch`.
Then, from under `<HUDI_REPO_DIR>/docker/hoodie/hadoop` directory, execute the following command to build as well as
push the image to the dockerhub repo:
```
# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
docker buildx build <image_folder_name> --platform <comma-separated,platforms> -t <hub-user>/<repo-name>[:<tag>] --push
# For example, to build base image
docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
```
Once the base image is pushed then you could do something similar for other images.
Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the base image with tag corresponding to
linux/arm64 platform.
```
# Change below line in the Dockerfile
FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
# as shown below
FROM --platform=linux/arm64 apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1
# and then build & push from under hoodie/hadoop dir
docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
```
Similarly, for images that are dependent on hive (e.g. [base spark](./hoodie/hadoop/spark_base/Dockerfile)
, [sparkmaster](./hoodie/hadoop/sparkmaster/Dockerfile), [sparkworker](./hoodie/hadoop/sparkworker/Dockerfile)
and [sparkadhoc](./hoodie/hadoop/sparkadhoc/Dockerfile)), change the corresponding Dockerfile to pull the base hive
image with tag corresponding to arm64. Then build and push using `docker buildx` command.
For the sake of completeness, here is a [patch](https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2) which
shows what changes to make in Dockerfiles (assuming tag is named `linux-arm64-0.10.1`), and below is the list
of `docker buildx` commands.
```
docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push
```
Once all the required images are pushed to the dockerhub repos, then we need to do one additional change
in [docker compose](./compose/docker-compose_hadoop284_hive233_spark244.yml) file.
Apply [this patch](https://gist.github.com/codope/3dd986de5e54f0650dd74b6032e4456c) to the docker compose file so
that [setup_demo](./setup_demo.sh) pulls images with the correct tag for arm64. And now we should be ready to run the
setup script and follow the docker demo.