I am running a long job using a cluster of computers. On occasion, the process is interrupted and I have to manually restart. There is considerable downtime when the interruptions occur overnight. I was wondering if there is a way run a supervisor script in Julia that monitors whether the job running in another instance of Julia. It would restart the process if it is interrupted and would terminate once the job is finished. Unfortunately, I do not know exactly how to check that the process is running and how to restart the process. Here is the rough idea I have:
state = true
while state == true
#check every minute
sleep(60)
data = readcsv("outputfile.csv")
#read file to check if process is finished
if size(data,1) < N
#some function to check if the process is running
if isrunning() == true
#Do nothing.Keep running
else
#some function to spawn new instance of julia
#run the code
include("myscript.jl")
end
else
#Job finished, exit while loop
state = false
end
end
Right tool for the right Job. Use your commandline shell. If something it untimely terminated, it will give a error status code.
Eg Bash
until julia myscript.jl;
do echo "Failed/Interrupted. Restarting in 5s. Press Ctrl-C now to interrupt.";
sleep 5;
done`
Because Julia is not unuable as a commandline runner you could do, in julia:
while true
try
run(`julia myscript.jl`) #Run a separate process
break
catch
println("Failed/Interrupted. Restarting in 5s. Press Ctrl-C now to interrupt.")
sleep(5)
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With