Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-broker Kafka on Kubernetes how to set KAFKA_ADVERTISED_HOST_NAME

My current Kafka deployment file with 3 Kafka brokers looks like this:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: kafka
spec:
  selector:
    matchLabels:
      app: kafka
  serviceName: kafka-headless
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app: kafka
    spec:
      containers:
      - name: kafka-instance
        image: wurstmeister/kafka
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_PORT
          value: "9092"
        - name: KAFKA_ADVERTISED_HOST_NAME
          valueFrom:
              fieldRef:
                fieldPath: metadata.name
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: "zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,\
                  zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,\
                  zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181"
        - name: BROKER_ID_COMMAND
          value: "hostname | awk -F '-' '{print $2}'"
        - name: KAFKA_CREATE_TOPICS
          value: hello:2:1
        volumeMounts:
        - name: data
          mountPath: /var/lib/kafka/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 50Gi

This creates 3 Kafka brokers as a Stateful Set and connects to the Zookeeper cluster using the Kubedns service with FQDN (Fully Qualified Domain Names) such as:

zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181

Broker IDs are generated based on the pod name:

- name: BROKER_ID_COMMAND
          value: "hostname | awk -F '-' '{print $2}'"

Result:

kafka-0 = 0
kafka-1 = 1
kafka-2 = 2

However, In order to use the Kubedns names for the Kafka brokers:

kafka-0.kafka-headless.default.svc.cluster.local:9092
kafka-1.kafka-headless.default.svc.cluster.local:9092
kafka-2.kafka-headless.default.svc.cluster.local:9092

I need to be able to set the KAFKA_ADVERTISED_HOST_NAME variable to the above FQDN values based on the name of the pod.

Currently I have the variable set to the name of the pod:

- name: KAFKA_ADVERTISED_HOST_NAME
   valueFrom:
      fieldRef:
        fieldPath: metadata.name

Result:

KAFKA_ADVERTISED_HOST_NAME=kafka-0
KAFKA_ADVERTISED_HOST_NAME=kafka-1
KAFKA_ADVERTISED_HOST_NAME=kafka-2

But somehow I would need to append the rest of the DNS name.

Is there a way I could set the DNS value directly?

Something like that:

- name: KAFKA_ADVERTISED_HOST_NAME
       valueFrom:
          fieldRef:
            fieldPath: kubedns.name
like image 350
Daniel Chmielewski Avatar asked Nov 27 '17 17:11

Daniel Chmielewski


People also ask

Why are there 3 brokers in Kafka?

In addition to @hqt answer: You can setup a Kafka HA Cluster with only 2 brokers, but the recommended replication-factor for production is 3, so you need 3 brokers in order to achieve this.

How many brokers should I have Kafka?

Even a lightly used Kafka cluster deployed for production purposes requires three to six brokers and three to five ZooKeeper nodes. The components should be spread across multiple availability zones for redundancy. Note: ZooKeeper will eventually be replaced, but its role will still have to be performed by the cluster.


2 Answers

I managed to solve the problem with a command field inside the pod definition:

    command:
    - sh
    - -c
    - "export KAFKA_ADVERTISED_HOST_NAME=$(hostname).kafka-headless.default.svc.cluster.local &&
       start-kafka.sh"

This runs a shell command which exports the advertised hostname environment variable based on the hostname value.

like image 127
Daniel Chmielewski Avatar answered Oct 16 '22 02:10

Daniel Chmielewski


- name: MY_POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name 
- name: KAFKA_ZOOKEEPER_CONNECT
  value: zook-zookeeper.zook.svc.cluster.local:2181
- name: KAFKA_PORT_NUMBER
  value: "9092"
- name: KAFKA_LISTENERS
  value: SASL_SSL://:$(KAFKA_PORT_NUMBER)
- name: KAFKA_ADVERTISED_LISTENERS
  value: SASL_SSL://$(MY_POD_NAME).kafka-kafka-headless.kafka.svc.cluster.local:$(KAFKA_PORT_NUMBER)

The above config would create your FQDN. You should be able to see those names in your Kafka logs when Kafka server starts.

NOTE: Kubernetes allows you to reference environment variables using the syntax $(VARIABLE)

like image 24
user1859658 Avatar answered Oct 16 '22 02:10

user1859658