Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build a multi node Kafka cluster on docker swarm

I found this docker image for Kafka

https://hub.docker.com/r/spotify/kafka/

and I can easily create a docker container using command documented in the link

docker run -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=`boot2docker ip` --env ADVERTISED_PORT=9092 spotify/kafka

This is good. But I want to configure a "multiple" node Kafka cluster running on a docker swarm.

How can I do that?

like image 740
Knows Not Much Avatar asked May 25 '16 05:05

Knows Not Much


People also ask

Which two types of nodes can you deploy in a docker Swarm?

A swarm consists of one or more nodes: physical or virtual machines running Docker Engine 1.12 or later in swarm mode. There are two types of nodes: managers and workers.

How many nodes are in Kafka cluster?

Even a lightly used Kafka cluster deployed for production purposes requires three to six brokers and three to five ZooKeeper nodes. The components should be spread across multiple availability zones for redundancy.


2 Answers

The previous approach raise some questions:

  1. How to specify the IDs for the zookeeper nodes?
  2. How to specify the id of the kafka nodes, and the zookeeper nodes?

#kafka configs echo "broker.id=${ID} advertised.host.name=${NAME} zookeeper.connect=${ZOOKEEPERS}" >> /opt/kafka/config/server.properties

Everything should be resolvable in the overlay network.

Moreover, in the issue Cannot create a Kafka service and publish ports due to rout mesh network there is a comment to don't use the ingress network.

I think the best option is to specify your service by using a docker compose with swarm. I'll edit the answer with an example.

like image 131
Fabio Fumarola Avatar answered Oct 24 '22 02:10

Fabio Fumarola


There are 2 concerns to consider: networking and storage.

Since Kafka is stateful service, until cloud native storage is figured out, it is advisable to use global deployment mode. That is each swarm node satisfying constraints will have one kafka container.

Another recommendation is to use host mode for published port.

It's also important to properly set advertised listeners option so that each kafka broker knows which host it's running on. Use swarm service templates to provide real hostname automatically.

Also make sure that published port is different from target port.

  kafka:
    image: debezium/kafka:0.8
    volumes:
      - ./kafka:/kafka/data
    environment:
      - ZOOKEEPER_CONNECT=zookeeper:2181
      - KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_MAX_MESSAGE_BYTES=20000000
      - KAFKA_MESSAGE_MAX_BYTES=20000000
      - KAFKA_CLEANUP_POLICY=compact
      - LISTENERS=PLAINTEXT://:9092
      - BROKER_ID=-1
      - ADVERTISED_LISTENERS=PLAINTEXT://{{.Node.Hostname}}:11092
    depends_on:
      - zookeeper
    deploy:
      mode: global
    ports:
      - target: 9092
        published: 11092
        protocol: tcp
        mode: host
    networks:
      - kafka

I can't explain all the options right now, but it's the configuration that works.

like image 36
Vanuan Avatar answered Oct 24 '22 01:10

Vanuan