Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SLURM and python, nodes are allocated, but the code only runs on one node

I have a 4*64 CPU cluster. I installed SLURM, and it seems to be working, as if i call sbatch i get the proper allocation and queue. However if i use more than 64 cores (so basically more than 1 node) it perfectly allocates the correct amount of nodes, but if i ssh into the allocated nodes i only see actual work in one of them. The rest just sits there doing nothing.

My code is complex, and it uses multiprocessing. I call pools with like 300 workers, so i guess it should not be the problem.

What i would like to achieve is to call sbatch myscript.py on like 200 cores, and SLURM should distribute my run on these 200 cores, not just allocate the correct amount of nodes but actually only use one.

The header of my python script looks like this:

#!/usr/bin/python3

#SBATCH --output=SLURM_%j.log
#SBATCH --partition=part
#SBATCH -n 200

and i call the script with sbatch myscript.py.

like image 338
Gábor Erdős Avatar asked Nov 30 '16 10:11

Gábor Erdős


People also ask

What are Slurm nodes?

Slurm is a job scheduler that manages cluster resources. It is what allows you to run a job on the cluster without worrying about finding a free node. It also tracks resource usage so nodes aren't overloaded by having too many jobs running on them at once.

How do I know if my Slurm is running?

You can get the status of the running slurmd daemon by executing the command "scontrol show slurmd" on the node of interest. Check the value of "Last slurmctld msg time" to determine if the slurmctld is able to communicate with the slurmd.

What Shell does Slurm use?

Slurm processes are not run under a shell, but directly exec'ed by the slurmd daemon (assuming srun is used to launch the processes).

What is Ntasks per node?

--ntasks-per-node=<ntasks> - Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option.


3 Answers

Unfortunately, multiprocessing does not allow working on several nodes. From the documentation:

the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine

One option, often used with Slurm, is to use MPI (with the MPI4PY package) but MPI is considered to be the 'the assembly language of parallel programming' and you will need to modify your code extensibly.

Another option is to look into the Parallel Processing packages for one that suits your needs and requires minimal changes to your code. See also this other question for more insights.

A final note: it is perfectly fine to put the #SBATCH directives in the Python script and use the Python shebang. But as Slurm executes a copy of the script rather than the script itself, you must add a line such as

sys.path.append(os.getcwd()) 

at the beginning of the script (but after the #SBATCH lines) to make sure Python finds any module located in your directory.

like image 86
damienfrancois Avatar answered Oct 25 '22 07:10

damienfrancois


I think your sbatch script should not be inside the python script. Rather it should be a normal bash script including the #SBATCH options followed by the actual script to run with srun jobs. like the following:

#!/usr/bin/bash

#SBATCH --output=SLURM_%j.log
#SBATCH --partition=part
#SBATCH -n 200

srun python3 myscript.py

I suggest testing this with a simple python script like this:

import multiprocessing as mp

def main():
    print("cpus =", mp.cpu_count())

if __name__ == "__main__":
    main()
like image 1
Lukisn Avatar answered Oct 25 '22 07:10

Lukisn


I tried to get around using different python libraries by using srun on the following bash script. srun should run on each node that you have allocated to you. The basic idea is that it determines what node it's running on and assigns a node id of 0, 1, ... , nnodes-1. Then it passes that information off to the python program along with a thread id. In the program I combine these two numbers to make a distinct id for each cpu on each node. This code assumes that there are 16 cores on each node and 10 nodes are going to be used.

#!/bin/bash

nnames=(`scontrol show hostnames`)
nnodes=${#nnames[@]}
nIDs=`seq 0 $(($nnodes-1))`
nID=0
for i in $nIDs
do
    hname=`hostname`
    if [ "${nnames[$i]}" == "$hname" ]
        then nID=$i
    fi
done
tIDs=`seq 0 15`

for tID in $tIDs
do
    python testDataFitting2.py $nID $tID 160 &
done
wait
like image 1
user1585635 Avatar answered Oct 25 '22 05:10

user1585635