Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the state 'drain' mean?

Tags:

slurm

When I use sinfo I see the following:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
[...]
RG3          up 28-00:00:0      1  drain rg3hpc4
[...]

What does the state 'drain' mean?

like image 855
Martin Thoma Avatar asked Mar 18 '14 13:03

Martin Thoma


People also ask

What does it mean to drain a node?

Node draining is the mechanism that allows users to gracefully move all containers from one node to the other ones. There are multiple use cases: Server maintenance. Autoscaling of the k8s cluster – nodes are added and removed dynamically. Preemptable or spot instances that can be terminated at any time.

What is Sinfo?

DESCRIPTION. sinfo is used to view partition and node information for a system running Slurm.

How do I know my Slurm version?

Check that compatible versions of Slurm exists on all of the nodes (execute "sinfo -V" or "rpm -qa | grep slurm"). The Slurm version number contains three period-separated numbers that represent both the major Slurm release and maintenance release level.


1 Answers

It means no further job will be scheduled on that node, but the currently running jobs will keep running (by contrast with setting the node down which kills all jobs running on the node).

Nodes are often set to that state so that some maintenance operation can take place once all running jobs are finished.

From the manpage of the scontrol command:

If you want to remove a node from service, you typically want to set it's state to "DRAIN"

Note that the system administrator most probably gave a reason why the node is drained, and you can see that reason with

sinfo -R
like image 109
damienfrancois Avatar answered Oct 17 '22 10:10

damienfrancois