In my Kubernetes cluster I want to define a StatefulSet using a local persistence volume on each node. My Kubernetes cluster has worker nodes.
My StatefulSet looks something like this:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: myset
spec:
replicas: 3
...
template:
spec:
....
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myset
topologyKey: kubernetes.io/hostname
containers:
....
volumeMounts:
- name: datadir
mountPath: /data
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "local-storage"
resources:
requests:
storage: 10Gi
I want to achieve, that on each POD, running on a separate node, a local data volume is used.
I defined a StorageClass object:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
and the following PersistentVolume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: datadir
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /var/lib/my-data/
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-node-1
But of course, this did not work as I have defined a nodeAffinity with only the hostname for my first worker-node-1. As a result I can see only one PV. The PVC and the POD on the corresponding node starts as expected. But on the other two nodes I have no PVs. How can I define, that a local PersistenceVolume is created for each worker-node?
I also tried to define a nodeAffinity with 3 values:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-node-1
- worker-node-2
- worker-node-3
But this also did not work.
I fear that the
PersitenceVolumeI define is the problem. This object will create exactly onePVand so only one of my PODs finds the correspondingPVand can be scheduled.
Yes, you're right. By creating PersistentVolume object, you create exactly one PersistentVolume. No more, no less. If you define 3 separate PVs that can be available on each of your 3 nodes, you shouldn't experience any problem.
If you have, let's say, 3 worker nodes, you need to create 3 separate PersistentVolumes, each one with different NodeAffinity. You don't need to define any NodeAffinity in your StatefulSet as it is already handled on PersistentVolume level and should be defined only there.
As you can read in the local volume documentation:
Compared to
hostPathvolumes,localvolumes are used in a durable and portable manner without manually scheduling pods to nodes. The system is aware of the volume's node constraints by looking at the node affinity on the PersistentVolume.
Remember: PVC -> PV mapping is always 1:1. You cannot bind 1 PVC to 3 different PVs or the other way.
So my only solution is to switch form local PV to hostPath volumes which is working fine.
Yes, it can be done with hostpath but I wouldn't say it is the only and the best solution. Local volumes have several advantages over hostpath volumes and it is worth considering choosing them. But as I mentioned above, in your use case you need to create 3 separate PVs manually. You already created one PV so it shouldn't be a big deal to create another two. This is the way to go.
I want to achieve, that on each POD, running on a separate node, a local data volume is used.
It can be achieved with local volumes but in such case instead of using a single PVC in your StatefulSet definition as in the below fragment from your configuration:
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
you need to use only volumeClaimTemplates as in this example, which may look as follows:
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
As you can see, the PVCs won't "look" for a PV with any particular name so you can name them as you wish. They will "look" for a PV belonging to a particular StorageClass and in this particular case supporting "ReadWriteOnce" accessMode.
The scheduler will attempt to find the adequate node, on which your stateful pod can be scheduled. If another pod was already scheduled, let's say, on worker-1 and the only PV belonging to our local-storage storage class isn't available any more, the scheduler will try to find another node that meets storage requirements. So again: no need for node affinity/ pod antiaffinity rules in your StatefulSet definition.
But I need some mechanism that a PV is created for each node and assigned with the PODs created by the StatefulSet. But this did not work - I always have only one PV.
In order to facilitate the management of volumes and automate the whole process to certain extent, take a look at Local Persistence Volume Static Provisioner. As its name may already suggest, it doesn't support dynamic provisioning (as we have e.g. on various cloud platforms), which means you are still responsible for creating the underlying storage but the whole volume lifecycle can be handled automatically.
To make this whole theoretical explanation somewhat more practical, I'm adding below a working example, which you can quickly test for yourself. Make sure /var/tmp/test directory is created on every nodes or adjust the below examples to your needs:
StatefulSet components (slightly modified example from here):
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "local-storage"
resources:
requests:
storage: 1Gi
StorageClass definition:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
And finally a PV. You need to make 3 versions of the below yaml manifest by setting different names e.g. example-pv-1,example-pv-2 and example-pv-3 and node names.
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv-1 ### 👈 change it
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /var/tmp/test ### 👈 you can adjust shared directory on the node
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-node-1 ### 👈 change this value by setting your node name
So 3 different PVs for 3 worker nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With