After reading some documentation regarding Persistent Volumes in Kubernetes I am wondering which one would be the best setup (storage speaking) for running a highly available ElasticSearch cluster. I am not running the typical EFK (or ELK) setup, but I am using ElasticSearch as a proper full-text search engine.
I've read the official Elastic Documentation, but I find it quite lacking of clarification. According to "Kubernetes in Action", Chapter 6:
When an application running in a pod needs to persist data to disk and have that same data available even when the pod is rescheduled to another node, you can’t use any of the volume types we’ve mentioned so far. Because this data needs to be accessible from any cluster node, it must be stored on some type of network-attached storage (NAS).
So if I am not mistaken, I need a Volume
and access it through PersistentVolumes
and PersistentVolumeClaim
with Retain
policies.
When looking at Official Volumes, I get a feeling that one should define the Volume type him/herself. Though, when looking at a DigitalOcean guide, it does not seem there was any Volume setup there. I picked that tutorial, but there are dozens on Medium that are all doing the same thing.
So: which one is the best setup for an ElasticSearch cluster? Of course keeping in mind order to not loose any data within an index, and being able to add pods(Kubernetes) or nodes (ElasticSearch) that can access the index.
A good pattern to deploy an ElasticSearch cluster in kubernetes is to define a StatefulSets
.
Because the StatefulSet replicates more than one Pod you cannot simply reference a persistent volume claim. Instead, you need to add a
persistent volume claim template
to the StatefulSet state definition.
In order for these replicated peristent volumes to work, you need to create a Dynamic Volume Provisioning and StorageClass which allows storage volumes to be created on-demand.
In the DigitalOcean guide tutorial, the persistent volume claim template is as follow:
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 100Gi
Here, the StorageClass
is do-block-storage
. You can replace it with your own storage class
Very interesting question,
You need to think of an Elasticsearch node in Kubernetes that would be equivalent to an Elasticsearch Pod.
And Kubernetes need to hold the identity of each pod to attach to the correct Persistent Volume claim in case of an outage, here comes the StatefulSet
A StatefulSet will ensure the same PersistentVolumeClaim stays bound to the same Pod throughout its lifetime.
A PersistentVolume (PV) is a Kubernetes abstraction for storage on the provided hardware. This can be AWS EBS, DigitalOcean Volumes, etc.
I'd recommend having a look into the Elasticsearch Offical Helm chart: https://github.com/elastic/helm-charts/tree/master/elasticsearch
Also Elasticsearch Operator: https://operatorhub.io/operator/elastic-cloud-eck
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With