Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If a Julia script is run from the command line, does it need to be re-compiled every time?

I've read through quite some documentation and questions but I'm still confused about this.

In the Profiling section of the documentation it's suggested to first run the target function in the REPL once, so that it's already compiled before being profiled. However, what if the script is fairly complicated and is inteded to be run in the command line, taking arguments? When the julia process finishes and I run the script the second time, is the compilation performed again? Posts like https://stackoverflow.com/a/42040763/1460448, Julia compiles the script every time? give conflicting answers. They also seem to be old while Julia is constantly evolving.

It seems to me that the second run takes exactly as much time as the first run in my experience. The startup time is quite long. How should I optimize such a program? Adding __precompile__() doesn't seem to have changed the execution time at all.

Also, what should I do when I want to profile such a program? All resources on profiling talk about doing so in the REPL.

like image 467
xji Avatar asked May 30 '18 15:05

xji


People also ask

How do I launch Julia from terminal?

1.6. For this, we will use the integrated terminal in VS Code. Recall that you can start this directly from the command palette with <Ctrl+Shift+P> then typing part of the > Julia: Start REPL command.

How do I run Julia code in Ubuntu?

To run Julia, you can do any of the following: Invoke the julia executable by using its full path: <Julia directory>/bin/julia. Create a symbolic link to julia inside a folder which is on your system PATH. Add Julia's bin folder (with full path) to your system PATH environment variable.

What is REPL in Julia?

REPL stands for read, execute, print, loop. Once Julia is installed, typing Julia at the command line opens the REPL. The REPL has many features that can help you test snippets and debug your code.


2 Answers

I disagree somewhat with my colleagues. There are absolutely valid scenarios where one would rely on running julia scripts. E.g. when you have a pipeline of scripts (e.g. matlab, python, etc) and you need to plug in a julia script somewhere in the middle of all that, and control the overall pipeline from a shell script. But, whatever the use case, saying "just use the REPL" isn't a proper answer to this question, and even if one couldn't come up with "valid" scenarios, it is still a question worth answering directly rather than with a workaround.

What I do agree on is that the solution to having appropriate code is to wrap everything critical that needs to be precompiled into modules, and only leave all but the most external commands at the script top-level. This is not too dissimilar to the matlab or C++ world anyway, where you're expected to write thorough functions, and only treat your script / main function as some sort of very brief, top-level entry point whose job is to simply prepare the initial environment, and then run those more specialised functions accordingly.

Here's an example of what I mean:

# in file 'myscript.jl'
push!( LOAD_PATH, "./" )
import MyPrecompiledModule
println( "Hello from the script. The arguments passed into it were $ARGS" )
MyPrecompiledModule.exportedfun()

# in file 'MyPrecompiledModule.jl' (e.g. in the same directory as myscript.jl)
__precompile__()
module MyPrecompiledModule
  export exportedfun;
  function innerfun()
    println("Hello from MyPrecompiledModule.innerfun");
  end

  function exportedfun()
    innerfun()
    print("Hello from MyPrecompiledModule.exportedfun");
  end
end

In the above scenario, the compiled version of the MyPrecompiledModule will be used in the script (and if one does not exist, one will be compiled the first time you run the script), therefore any optimisations from compiling will not be lost at the end of the script, but you still end up with a standalone julia script you can use as part of a bash shell script pipeline process, that you can also pass arguments to. The myscript.jl script then only has to pass these on to the imported module functions if necessary, and perform any other commands that you don't particularly care about them being compiled / optimised or not, such as perform benchmarks, provide script usage instructions, etc.

like image 52
Tasos Papastylianou Avatar answered Sep 21 '22 10:09

Tasos Papastylianou


Please correct me if I am wrong, but it sounds like you have written some long script, say, myfile.jl, and then from your OS command line you are calling julia myfile.jl args.... Is this correct? Also, it sounds like myfile.jl does not define much in the way of functions, but is instead just a sequence of commands. Is this correct? If so, then as has been suggested in the comments on the question, this is not the typical work-flow for julia, for two reasons:

1) Calling julia from the command line, ie julia myfile.jl args... is equivalent to opening a REPL, running an include command on myfile.jl, and then closing the REPL. The initial call to include will compile any methods that are needed for the operations in myfile.jl, which takes time. But since you're running from the command line, once the include is finished, the REPL automatically closes, and all that compiled code is thrown away. This is what DNF means when he says the recommended workflow is to work within a single REPL session, and don't close it until you are done for the day, or unless you deliberately want to recompile all the methods you are using.

2) Even if you are working within a single REPL session, it is extremely important to wrap pretty much everything you do in functions (this is a very different workflow to languages like Matlab). If you do this, Julia will compile methods for each function that are specialized on the types of the input arguments that you are using. This is essentially why Julia is fast. Once a method is compiled once, it remains available for the entire REPL session, but is disposed of when you close the REPL. Critically, if you do not wrap your operations in functions, then this specialized compilation does not occur, and so you can expect very slow code. In julia, we call this "working in the global scope". Note that this feature of Julia encourages a coding style consisting of breaking your tasks down into lots of small specialized functions rather than one behemoth consisting of 1000 lines of code. This is a good idea for many reasons. (in my own codebase, many functions are a single-liners, most are 5 lines or less)

The two points above are absolutely critical to understand if you are working in Julia. However, once you are comfortable with them, I would recommend that you actually put all your functions inside modules, and then call your module(s) from an active REPL session whenever you need it. This has the additional advantage that you can just add a __precompile__() statement at the top of your module, and then julia will precompile some (but not necessarily all) of the code in that module. Once you do this, the precompiled code in your module doesn't disappear when you close the REPL, since it is stored on the hard-drive in a .ji file. So you can start a new REPL session, type using MyModule, and your precompiled code is immediately available. It will only need to re-compile if you alter the contents of the module (and this all happens automatically).

like image 43
Colin T Bowers Avatar answered Sep 21 '22 10:09

Colin T Bowers