Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting up slurm.conf file for single computer

Tags:

slurm

Hi I am attempting to utilize a processing pipeline which is written to run on multiple computer clusters using slurm however I would prefer to run it on a single compluter. I am on Ubuntu 18 and have installed slurm-wlm however I have not been able to get the pipeline to read my slurm.conf file which I made from Slurm Version 18.08 Configuration Tool online with the goal of running this as a single node so I dont have to rewrite the pipeline code.

Everytime I attempt to run this pipeline sh script the log-file gives this error

sbatch: error: _parse_next_key: Parsing error at unrecognized key: SlurmctldHost sbatch: error: Parse error in file /etc/slurm-llnl/slurm.conf line 2: "SlurmctldHost=charlie-Z370M-D3H" sbatch: fatal: Unable to process configuration file

charlie-Z370M-D3H is the hostname

below is my slurm.conf text and I hope someone can see what I need to do to get this to work

#
SlurmctldHost=charlie-Z370M-D3H
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
TaskPluginParam=Sched
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=linux[1-32] CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=linux[1-32] Default=YES MaxTime=INFINITE State=UP
like image 709
Michael Sughrue Avatar asked Oct 28 '18 05:10

Michael Sughrue


People also ask

Where do I put Slurm conf?

Install the configuration file in <sysconfdir>/slurm. conf. NOTE: You will need to install this configuration file on all nodes of the cluster.

Where does Slurm Conf live?

The configuration files for slurm-llnl reside under /etc/slurm-llnl . Prior to starting any slurm-services, it has to be configured properly by creating a configuration file at /etc/slurm-llnl/slurm. conf .


1 Answers

I have had the same issue and it turns out that the conf-file generated on that webpage is only valid for 18.08 If you look at the webpage where you created the slurm.conf-file you may notice that it is only valid for version 18.08. Thus, please verify that your version of SLURM is at least 18.x, since the key "SlurmctldHost" in the conf-file was introduced then.

You can verify your version of SLURM by simple typing "dpkg -l | grep slurm" and note which version is installed. For Ubuntu 18.x the default package installed is of slurm-version 17.11.9. (You might have to download the source-code from https://www.schedmd.com/archives.php by selecting the version you have installed and download it to your local machine.

Unpack it and look into "/doc/html/"-dir where you´ll find t he corrensponding configurator-html-script for your version.) E.g. if your version is 17.11.9, then the corresponding key of "SlurmctldHost" (as introduced in 18.08), is "ControlMachine" in version 17.11.9. So use the configurator-html-script in your local slurm-doc-dir to generate a valid slurm.conf for your installed version of slurm. I did that and it works fine.

like image 76
Pälle Avatar answered Sep 18 '22 13:09

Pälle