Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slurm setup: Error connecting slurm stream socket

Tags:

slurm

I am trying to setup a new cluster with slurm. I have setup a client and a control machine. (I am new to this .....)

  1. When I type sinfo -vvv from control machine it is telling

" sinfo: debug2: slurm_connect failed: Connection refused sinfo: debug2: Error connecting slurm stream socket at 192.168.155.142:6817: Connection refused "

My slurm is configured to use 6817 port (full config is available here https://pastebin.com/X4yDe99z

SlurmctldPort=6817

The port is open ( I tried with ufw disabled also)

6817 (v6) ALLOW Anywhere (v6)

  1. When I try slurmctld -Dvvv it is showing this error

slurmctld: error: this host (xxxx/xxx) not a valid controller (gaia or (null))

My /etc/hosts file is 127.0.0.1 localhost 192.168.155.142 gaia

like image 629
knightrider Avatar asked Feb 28 '26 00:02

knightrider


1 Answers

The value of the parameter ControlMachine in slurm.conf, the machine on which you start slurmctld, must be the exact output of hostname -s on that machine for the daemon to start.

It seems hostname -s on your machine does not output gaia. Replace gaia with what is hidden behind xxxx/xxx.

like image 52
damienfrancois Avatar answered Mar 02 '26 13:03

damienfrancois



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!