I am trying to setup a new cluster with slurm. I have setup a client and a control machine. (I am new to this .....)
" sinfo: debug2: slurm_connect failed: Connection refused sinfo: debug2: Error connecting slurm stream socket at 192.168.155.142:6817: Connection refused "
My slurm is configured to use 6817 port (full config is available here https://pastebin.com/X4yDe99z
SlurmctldPort=6817
The port is open ( I tried with ufw disabled also)
6817 (v6) ALLOW Anywhere (v6)
slurmctld: error: this host (xxxx/xxx) not a valid controller (gaia or (null))
My /etc/hosts file is
127.0.0.1 localhost
192.168.155.142 gaia
The value of the parameter ControlMachine in slurm.conf, the machine on which you start slurmctld, must be the exact output of hostname -s on that machine for the daemon to start.
It seems hostname -s on your machine does not output gaia. Replace gaia with what is hidden behind xxxx/xxx.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With