sbatch: error: Batch job submission failed: Socket timed out on send/recv operation when running Snakemake

Question

I am running a snakemake pipeline on a HPC that uses slurm. The pipeline is rather long, consisting of ~22 steps. Periodically, snakemake will encounted a problem when attempting to submit a job. This reults in the error

sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
Error submitting jobscript (exit code 1):

I run the pipeline via a sbatch file with the following snakemake call

snakemake -j 999 -p --cluster-config cluster.json --cluster 'sbatch --account {cluster.account} --job-name {cluster.job-name} --ntasks-per-node {cluster.ntasks-per-node} --cpus-per-task {threads} --mem {cluster.mem} --partition {cluster.partition} --time {cluster.time} --mail-user {cluster.mail-user} --mail-type {cluster.mail-type} --error {cluster.error} --output {cluster.output}'

This results in not only an output for snakemake sbatch job, but also for the jobs that snakemake creates. The above error appears in the slurm.out for the sbatch file.

The specific job step the error indicates will run successfully, and give output, but the pipeline fails. The logs of the job step show that the job id ran without a problem. I have googled this error, and it appears to happen often with slurm, and especially when the scheduler is under high IO, which suggests it will be an inevitable and regular occurrence. I was hoping someone has encountered this problem, and could offer suggestions for a work around, so that the entire pipeline doesn't fail.

dariober · Accepted Answer

snakemake has an option --max-jobs-per-second and --max-status-checks-per-second with default argument of 10. Maybe try decreasing them to reduce strain on the scheduler? Also, maybe try to reduce -j 999?

sbatch: error: Batch job submission failed: Socket timed out on send/recv operation when running Snakemake

Tags:

Manninm

1 Answers

dariober

Recent Activity

Donate For Us

sbatch: error: Batch job submission failed: Socket timed out on send/recv operation when running Snakemake

Tags:

Manninm

1 Answers

dariober

Related questions

Recent Activity

Donate For Us