Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for data storage with Elasticsearch and Kubernetes

After reading some documentation regarding Persistent Volumes in Kubernetes I am wondering which one would be the best setup (storage speaking) for running a highly available ElasticSearch cluster. I am not running the typical EFK (or ELK) setup, but I am using ElasticSearch as a proper full-text search engine.

I've read the official Elastic Documentation, but I find it quite lacking of clarification. According to "Kubernetes in Action", Chapter 6:

When an application running in a pod needs to persist data to disk and have that same data available even when the pod is rescheduled to another node, you can’t use any of the volume types we’ve mentioned so far. Because this data needs to be accessible from any cluster node, it must be stored on some type of network-attached storage (NAS).

So if I am not mistaken, I need a Volume and access it through PersistentVolumes and PersistentVolumeClaim with Retain policies.

When looking at Official Volumes, I get a feeling that one should define the Volume type him/herself. Though, when looking at a DigitalOcean guide, it does not seem there was any Volume setup there. I picked that tutorial, but there are dozens on Medium that are all doing the same thing.

So: which one is the best setup for an ElasticSearch cluster? Of course keeping in mind order to not loose any data within an index, and being able to add pods(Kubernetes) or nodes (ElasticSearch) that can access the index.

like image 896
purple_lolakos Avatar asked Feb 12 '21 15:02

purple_lolakos


2 Answers

A good pattern to deploy an ElasticSearch cluster in kubernetes is to define a StatefulSets.

Because the StatefulSet replicates more than one Pod you cannot simply reference a persistent volume claim. Instead, you need to add a persistent volume claim template to the StatefulSet state definition.

In order for these replicated peristent volumes to work, you need to create a Dynamic Volume Provisioning and StorageClass which allows storage volumes to be created on-demand.

In the DigitalOcean guide tutorial, the persistent volume claim template is as follow:

  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: do-block-storage
      resources:
        requests:
          storage: 100Gi

Here, the StorageClass is do-block-storage. You can replace it with your own storage class

like image 121
Andrianekena Moise Avatar answered Oct 12 '22 09:10

Andrianekena Moise


Very interesting question,

You need to think of an Elasticsearch node in Kubernetes that would be equivalent to an Elasticsearch Pod.

And Kubernetes need to hold the identity of each pod to attach to the correct Persistent Volume claim in case of an outage, here comes the StatefulSet

A StatefulSet will ensure the same PersistentVolumeClaim stays bound to the same Pod throughout its lifetime.

A PersistentVolume (PV) is a Kubernetes abstraction for storage on the provided hardware. This can be AWS EBS, DigitalOcean Volumes, etc.

I'd recommend having a look into the Elasticsearch Offical Helm chart: https://github.com/elastic/helm-charts/tree/master/elasticsearch

Also Elasticsearch Operator: https://operatorhub.io/operator/elastic-cloud-eck

like image 27
sadok-f Avatar answered Oct 12 '22 08:10

sadok-f