Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to automatically restart long job in Julia

Tags:

I am running a long job using a cluster of computers. On occasion, the process is interrupted and I have to manually restart. There is considerable downtime when the interruptions occur overnight. I was wondering if there is a way run a supervisor script in Julia that monitors whether the job running in another instance of Julia. It would restart the process if it is interrupted and would terminate once the job is finished. Unfortunately, I do not know exactly how to check that the process is running and how to restart the process. Here is the rough idea I have:

state = true
while state == true
    #check every minute
    sleep(60)
    data = readcsv("outputfile.csv")
    #read file to check if process is finished 
    if  size(data,1) < N
        #some function to check if the process is running 
        if isrunning() == true
            #Do nothing.Keep running
        else
        #some function to spawn new instance of julia
        #run the code
            include("myscript.jl")
        end
    else
        #Job finished, exit while loop
        state = false
    end
end 
like image 523
Christopher Fisher Avatar asked Jul 13 '16 11:07

Christopher Fisher


1 Answers

Right tool for the right Job. Use your commandline shell. If something it untimely terminated, it will give a error status code.

Eg Bash

until julia myscript.jl; 
do echo "Failed/Interrupted. Restarting in 5s. Press Ctrl-C now to interrupt.";
sleep 5;
done`

Because Julia is not unuable as a commandline runner you could do, in julia:

while true
    try
        run(`julia myscript.jl`) #Run a separate process
        break
    catch
        println("Failed/Interrupted. Restarting in 5s. Press Ctrl-C now to interrupt.")
        sleep(5)
    end
end
like image 60
Lyndon White Avatar answered Sep 28 '22 02:09

Lyndon White