Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Letting only one elasticsearch pod come up on a node in Kubernetes

We have a multi-node setup of our product where we need to deploy multiple Elasticsearch pods. As all these are data nodes and have volume mounts for persistent storage, we don't want to bring two pods up on the same node. I'm trying to use the anti-affinity feature of Kubernetes, but to no avail.

The cluster deployment is done through Rancher. We have 5 nodes in the cluster, and three nodes (let's say node-1, node-2 and node-3) have the label test.service.es-master: "true". So, when I deploy the helm chart and scale it up-to 3, Elasticsearch pods are up and running on all these three nodes. but if I scale it to 4, the 4th data node comes in one of the above mentioned nodes. Is that a correct behavior? My understanding was, imposing a strict anti-affinity should prevent the pods from coming up on the same node. I've referred to multiple blogs and forums (e.g. this and this), and they suggest similar changes as mine. I'm attaching the relevant section of the helm chart.

The requirement is, we need to bring up ES on only those nodes which are labelled with specific key-value pair as mentioned above, and each of those nodes should only contain one pod. Any feedback is appreciated.

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    test.service.es-master: "true"
  name: {{ .Values.service.name }}
  namespace: default
spec:
  clusterIP: None
  ports:
  ...
  selector:
    test.service.es-master: "true"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    test.service.es-master: "true"
  name: {{ .Values.service.name }}
  namespace: default
spec:
  selector:
    matchLabels:
      test.service.es-master: "true"
  serviceName: {{ .Values.service.name }}
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: test.service.es-master
            operator: In
            values:
            - "true"
        topologyKey: kubernetes.io/hostname
  replicas: {{ .Values.replicaCount }}
  template:
    metadata:
      creationTimestamp: null
      labels:
        test.service.es-master: "true"
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: test.service.es-master
                operator: In
                values:
                  - "true"
              topologyKey: kubernetes.io/hostname
      securityContext:
             ...
      volumes:
        ...
      ...
status: {}

Update-1

As per the suggestions in the comments and answers, I've added the anti-affinity section in template.spec. But unfortunately the issue still remains. The updated yaml looks like as follows:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    test.service.es-master: "true"
  name: {{ .Values.service.name }}
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: {{ .Values.service.httpport | quote }}
    port: {{ .Values.service.httpport }}
    targetPort: {{ .Values.service.httpport }}
  - name: {{ .Values.service.tcpport | quote }}
    port: {{ .Values.service.tcpport }}
    targetPort: {{ .Values.service.tcpport }}
  selector:
    test.service.es-master: "true"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    test.service.es-master: "true"
  name: {{ .Values.service.name }}
  namespace: default
spec:
  selector:
    matchLabels:
      test.service.es-master: "true"
  serviceName: {{ .Values.service.name }}
  replicas: {{ .Values.replicaCount }}
  template:
    metadata:
      creationTimestamp: null
      labels:
        test.service.es-master: "true"
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
            matchExpressions:
            - key: test.service.es-master
              operator: In
              values:
              - "true"
            topologyKey: kubernetes.io/hostname
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: test.service.es-master
                operator: In
                values:
                  - "true"
              topologyKey: kubernetes.io/hostname
      securityContext:
             readOnlyRootFilesystem: false
      volumes:
       - name: elasticsearch-data-volume
         hostPath:
            path: /opt/ca/elasticsearch/data
      initContainers:
         - name: elasticsearch-data-volume
           image: busybox
           securityContext:
                  privileged: true
           command: ["sh", "-c", "chown -R 1010:1010 /var/data/elasticsearch/nodes"]
           volumeMounts:
              - name: elasticsearch-data-volume
                mountPath: /var/data/elasticsearch/nodes
      containers:
      - env:
        {{- range $key, $val := .Values.data }}
        - name: {{ $key }} 
          value: {{ $val | quote }}
        {{- end}}
        image: {{ .Values.image.registry }}/analytics/{{ .Values.image.repository }}:{{ .Values.image.tag }}
        name: {{ .Values.service.name }}
        ports:
        - containerPort: {{ .Values.service.httpport }}
        - containerPort: {{ .Values.service.tcpport }}
        volumeMounts:
              - name: elasticsearch-data-volume
                mountPath: /var/data/elasticsearch/nodes    
        resources:
          limits:
            memory: {{ .Values.resources.limits.memory }}
          requests:
            memory: {{ .Values.resources.requests.memory }}
        restartPolicy: Always
status: {}
like image 603
Bitswazsky Avatar asked Dec 26 '18 08:12

Bitswazsky


1 Answers

As Egor suggested, you need podAntiAffinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"

Source: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#always-co-located-in-the-same-node

So, with your current label, it might look like this:

spec:
  affinity:
    nodeAffinity:
    # node affinity stuff here
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: "test.service.es-master"
            operator: In
            values:
            - "true"
        topologyKey: "kubernetes.io/hostname"

Ensure that you put this in the correct place in your yaml, or else it won't work.

like image 184
DiplomacyNotWar Avatar answered Oct 08 '22 13:10

DiplomacyNotWar