Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between NFS-PV, hostPath-PV on NFS and hostPath mount in deployment

I have a Kubernetes cluster setup (on-premise), that has an NFS share (my-nfs.internal.tld) mounted to /exports/backup on each node to create backups there.

Now I'm setting up my logging stack and I wanted to make the data persistent. So I figured I could start by storing the indices on the NFS.

Now I found three different ways to achieve this:

NFS-PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: logging-data
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nfs:
    server: my-nfs.internal.tld
    path: /path/to/exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: logging-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: logging-data
  resources:
    requests:
      storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          persistentVolumeClaim:
            claimName: logging-data-pvc

This would, of course, require, that my cluster gets access to the NFS (instead of only the nodes as it is currently setup).

hostPath-PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: logging-data
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: logging-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: logging-data
  resources:
    requests:
      storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          persistentVolumeClaim:
            claimName: logging-data-pvc

hostPath mount in deployment

As the nfs is mounted to all my nodes, I could also just use the host path directly in the deployment without pinning anything.

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          hostPath:
            path: /exports/backup/logging-data
            type: DirectoryOrCreate

So my question is: Is there really any difference between these three? I'm pretty sure all three work. I tested the second and third already. I was not yet able to test the first though (in this specific setup at least). Especially the second and third solutions seem very similar to me. The second makes it easier to re-use deployment files on multiple clusters, I think, as you can use persistent volumes of different types without changing the volumes part of the deployment. But is there any difference beyond that? Performance maybe? Or is one of them deprecated and will be removed soon?

I found a tutorial mentioning, that the hostPath-PV only works on single-node clusters. But I'm sure it does also works in my case here. Maybe the comment was about: "On multi-node clusters the data changes when deployed to different nodes."

From reading to a lot of documentation and How-To's I understand, that the first one is the preferred solution. I would probably also go for it as it is the one easiest replicated to a cloud setup. But I do not really understand why this is preferred to the other two.

Thanks in advance for your input on the matter!

like image 340
Max N. Avatar asked Nov 06 '22 03:11

Max N.


1 Answers

The NFS is indeed the preferred solution:

An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This means that an NFS volume can be pre-populated with data, and that data can be shared between pods. NFS can be mounted by multiple writers simultaneously.

So, an NFS is useful for two reasons:

  • Data is persistent.

  • It can be accessed from multiple pods at the same time and the data can be shared between pods.

See the NFS example for more details.

While the hostPath:

A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.

Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes

The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume

hostPath is not recommended due to several reasons:

  • You don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.

  • You expose your cluster to security threats.

  • If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.

the hostPath would be good if for example you would like to use it for log collector running in a DaemonSet. Other than that, it would be better to use the NFS.

like image 99
Wytrzymały Wiktor Avatar answered Nov 15 '22 09:11

Wytrzymały Wiktor