According to https://slurm.schedmd.com/quickstart_admin.html#HA high availability of SLURM is achieved by deploying a second BackupController which takes over when the primary fails and retrieves the current state from a shared file system (probably NFS).
In my opinion this has a number of drawbacks. E.g. this limits the total number of server to two and the second server is probably barely used.
Is this the only way to get a highly available head node with SLURM?
What I would like to do is a classic 3-tiered setup: A load balancer in the first tier which spreads all requests evenly across the nodes in the seconds tier. This requires the head node(s) to be stateless. The third tier is the database tier where all information is stored or read. I don't know anything about the internals of SLURM and I'm not sure if this is even remotely possible.
In the current design, the controller internal state is in-memory, and Slurm saves it to a set of files in the directory pointed to by the StateSaveLocation configuration parameter regularly. Only one instance of slurmctld can write to that directory at a time.
One problem with storing the state in the database would be a terrible latency in resource allocation with a lot of synchronisations needed, because optimal resource allocation can only be done with full information. The infrastructure needed to support the same level of throughput as Slurm can handle now with in-memory state would be very costly compared with the current solution implying only bitwise operations on arrays in memory.
Is this the only way to get a highly available head node with SLURM?
You can also have a single MasterController managed with Corosync. But indeed Slurm only has active/passive options available for HA.
In my opinion this has a number of drawbacks. E.g. this limits the total number of server to two and the second server is probably barely used.
The load on the controller is often very reasonable with respect to the current processing power, and the resource allocation problem cannot be trivially parallelised (or made stateless). Often, the backup controller is co-located on a machine running another service. For instance, on small deployments, one machine runs the Slurm primary controller, and other services (NFS, LDAP, etc.), etc. while another is the user login node, that also acts as a secondary Slurm controller.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With