I am reading Julia performance tips, https://docs.julialang.org/en/v1/manual/performance-tips/
At the beginning, it mentions two examples.
Example 1,
julia> x = rand(1000);
julia> function sum_global()
s = 0.0
for i in x
s += i
end
return s
end;
julia> @time sum_global()
0.009639 seconds (7.36 k allocations: 300.310 KiB, 98.32% compilation time)
496.84883432553846
julia> @time sum_global()
0.000140 seconds (3.49 k allocations: 70.313 KiB)
496.84883432553846
We see a lot of memory allocations.
Now example 2,
julia> x = rand(1000);
julia> function sum_arg(x)
s = 0.0
for i in x
s += i
end
return s
end;
julia> @time sum_arg(x)
0.006202 seconds (4.18 k allocations: 217.860 KiB, 99.72% compilation time)
496.84883432553846
julia> @time sum_arg(x)
0.000005 seconds (1 allocation: 16 bytes)
496.84883432553846
We see that by putting x into into the argument of the function, memory allocations almost disappeared and the speed is much faster.
My question are, can anyone explain,
why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1? I am a little confused.
in the two examples, we see that the second time we run Julia, it is always faster than the first time. Does that mean we need to run Julia twice? If Julia is only fast at the second run, then what is point? Why not Julia just do a compiling first, then do a run, just like Fortran?
Is there any general rule to preventing memory allocations? Or do we just always have to do a @time to identify the issue?
Thanks!
why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1?
Example 1 needs so many allocations, because x
is a global variable (defined out of scope of the function sum_arg
). Therefore the type of variable x
can potentially change at any time, i.e. it is possible that:
x
and sum_arg
sum_arg
x
(change its type) and run sum_arg
In particular, as Julia supports multiple threading, both actions in step 3 in general could happen even in parallel (i.e. you could have changed the type of x
in one thread while sum_arg
would be running in another thread).
So because after compilation of sum_arg
the type of x
can change Julia, when compiling sum_arg
has to ensure that the compiled code does not rely on the type of x
that was present when the compilation took place. Instead Julia, in such cases, allows the type of x
to be changed dynamically. However, this dynamic nature of allowed x
means that it has to be checked in run-time (not compile time). And this dynamic checking of x
causes performance degradation and memory allocations.
You could have fixed this by declaring x
to be a const
(as const
ensures that the type of x
may not change):
julia> const x = rand(1000);
julia> function sum_global()
s = 0.0
for i in x
s += i
end
return s
end;
julia> @time sum_global() # this is now fast
0.000002 seconds
498.9290555615045
Why not Julia just do a compiling first, then do a run, just like Fortran?
This is exactly what Julia does. However, the benefit of Julia is that it does compilation automatically when needed. This allows you for a smooth interactive development process.
If you wanted you could compile the function before it is run with the precompile
function, and then run it separately. However, normally people just run the function without doing it explicitly.
The consequence is that if you use @time
:
Is there any general rule to preventing memory allocations?
These rules are exactly given in the Performance Tips section of the manual that you are quoting in your question. The tip on using @time
is a diagnostic tip there. All other tips are rules that are recommended to get a fast code. However, I understand that the list is long so a shorter list that is good enough to start with in my experience is:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With