slurm.conf should set the RealMemory of nodes to a value less than or equal to the memory available in the node. Otherwise the node will be set to a drain state.
How do I know the memory that slurm gets from the OS and compares to RealMemory to determine if the node should be drained?
Type make to compile Slurm. Type make install to install the programs, documentation, libraries, header files, etc. Build a configuration file using your favorite web browser and the Slurm Configuration Tool. NOTE: The SlurmUser must exist prior to starting Slurm and must exist on all nodes of the cluster.
You should check the log file (SlurmdLog in the slurm. conf file) for an indication of why it failed. You can get the status of the running slurmd daemon by executing the command "scontrol show slurmd" on the node of interest.
If you are using SLURM for your job scheduler And add a line for each type of GPU node. At the bottom, extend the NodeName= to include the additional nodes or add a new line if the nodes are different. Then from the head node, restart the services. Enable and start the slurm daemon on the new compute nodes.
You can run slurmd -C
on the compute node. From the man page:
-C
Print actual hardware configuration and exit. The format of output is
the same as used in slurm.conf to describe a node's configuration
plus it's uptime.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With