I am porting this Python code...
with open(filename, 'r') as f:
results = [np.array(line.strip().split(' ')[:-1], float)
for line in filter(lambda l: l[0] != '#', f.readlines())]
...to Julia. I came up with:
results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1])
for s in filter(s -> s[1] !== '#', readlines(filename))];
The main reason for this porting is a potential performance gain, so I timed the two snippets in a Jupyter notebook:
%%timeit
...
12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
@benchmark
returns (among other things) mean time: 8.250 ms (2.62% GC)
. So far so good; I do get a performance boost.@time
:
0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time)
. From this thread I inferred that it was probably caused by my -> function declarations.Indeed, if I replace my code with:
filt = s -> s[1] !== '#';
pars = ss -> parse(Float64, ss);
res = [map(pars, split(s, " ")[1:end-1])
for s in filter(filt, readlines(filename))];
and time only the last line, I get a more encouraging 0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time)
; hurray! However, it kind of defeats the purpose (at least as I understand it) of anonymous functions and might lead to a bunch of f1, f2, f3, ... Giving a name to my Python lambda function out of the list comprehension does not seem to affect the Python's runtime.
My question is: to get normal performances, should I systematically name my Julia functions? Note that this particular snippet is to be called in a loop over ~30k files. (Basically, what I am doing is reading files that are mixtures of space-separated floats and comment lines; each float line can have a different length, and I am not interested in the last element of the line. Any comments on my solutions are appreciated.)
(Side comment: wrapping s
with strip
completely messes up @benchmark
, adding 10 ms to the mean, but does not seem to affect @time
. Any reason why?)
Putting everything in a function as suggested by DNF fixes my "have to name my anonymous functions" problem. Using one of Vincent Yu's formulations:
function results(filename::String)::Vector{Vector{Float64}}
[[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]]
for line in Iterators.filter(!startswith('#'), eachline(filename))]
end
@benchmark results(FN)
BenchmarkTools.Trial:
memory estimate: 3.74 MiB
allocs estimate: 1465
--------------
minimum time: 7.108 ms (0.00% GC)
median time: 7.458 ms (0.00% GC)
mean time: 7.580 ms (1.58% GC)
maximum time: 9.538 ms (14.84% GC)
--------------
samples: 659
evals/sample: 1
@time called on this function returns equivalent results after the first compilation run. I am happy with that.
However, this is my persisting issue with strip:
function results_strip(filename::String)::Vector{Vector{Float64}}
[[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]]
for line in Iterators.filter(!startswith('#'), eachline(filename))]
end
@benchmark results_strip(FN)
BenchmarkTools.Trial:
memory estimate: 3.74 MiB
allocs estimate: 1465
--------------
minimum time: 15.155 ms (0.00% GC)
median time: 15.742 ms (0.00% GC)
mean time: 15.885 ms (0.75% GC)
maximum time: 19.089 ms (10.02% GC)
--------------
samples: 315
evals/sample: 1
The median time doubles. If I look at strip only:
function only_strip(filename::String)
[strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))]
end
@benchmark only_strip(FN)
BenchmarkTools.Trial:
memory estimate: 1.11 MiB
allocs estimate: 475
--------------
minimum time: 223.868 μs (0.00% GC)
median time: 258.227 μs (0.00% GC)
mean time: 325.389 μs (9.41% GC)
maximum time: 56.024 ms (75.09% GC)
--------------
samples: 10000
evals/sample: 1
Figures just do not add up. Could there be a type mismatch, should I cast the results to something else?
Anonymous objects are faster than named objects. But calling more functions is more expensive, and to a degree which eclipses any savings you might get from using anonymous functions. Each function called adds to the call stack, which introduces a small but non-trivial amount of overhead.
The advantage of an anonymous function is that it does not have to be stored in a separate file. This can greatly simplify programs, as often calculations are very simple and the use of anonymous functions reduces the number of code files necessary for a program.
In order to (hopefully) clearly summarize what Colin T Bowers and DNF commented:
@btime
) the time is always measured after compilation. If you just use @time
the computed time includes compilation. Actually you get this information in the output (where you got % of time spent in compilation).The conclusion is:
Let me give you a canonical example showing the problem that will hopefully help you understand the issue better:
julia> x = rand(10^6);
julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled
0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time)
499921
julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time)
499921
julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time)
499921
julia> f(x) = x < 0.5
f (generic function with 1 method)
julia> @time count(f, x) # f is a new function - it has to be compiled
0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time)
499921
julia> @time count(f, x) # f was already compiled - we are fast
0.000596 seconds (2 allocations: 32 bytes)
499921
julia> @time count(f, x) # f was already compiled - we are fast
0.000621 seconds (2 allocations: 32 bytes)
499921
julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled
0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time)
499921
julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
0.000504 seconds (2 allocations: 32 bytes)
499921
julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
0.000616 seconds (2 allocations: 32 bytes)
The point is that v -> v > 0.5
is a new function every time you write it, even if you used exactly the same definition - Julia has to create a new anonymous function if you introduce it in global scope. It can be easily seen here:
julia> v -> v > 0.5
#7 (generic function with 1 method)
julia> v -> v > 0.5
#9 (generic function with 1 method)
(note that the number increases - it is a different function)
Now have a look at >(0.5)
:
julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)
julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)
It is the same callable every time - so it has to be compiled only once.
Finally if you wrap things in a function, as DNF explained you have:
julia> test() = v -> v > 0.5
test (generic function with 1 method)
julia> test()
#11 (generic function with 1 method)
julia> test()
#11 (generic function with 1 method)
And as you can see as anonymous function is defined within a named function the compiler knows it is the same anonymous function every time, so the number does not increase (it had to be compiled only once - the first time test
got called).
Regarding the strip
issue. The difference is visible in @btime
but not with @time
as the cost of strip
in @time
is dwarfed by the cost of compilation so you are simply unable to see the difference then, but it is actually the same in both cases.
Bogumił Kamiński's answer is excellent. I'm just writing this to comment on your solution.
Note that you can use the DelimitedFiles
module from the standard library to read such files into matrices. Like this:
using DelimitedFiles
readdlm(filename, ' ', Float64; comments=true, comment_char='#')
But you may find this slower than your code because it reads data into a column-major matrix instead of a row-based vector of vectors. Which one is better depends on your needs. (And of course, there are many packages that can read delimited files into various structures.)
Regarding your solution, I suggest a few small changes that can improve performance and memory use:
readlines
and filter
allocate new vectors that you don't keep. To avoid these memory allocations, use the iterator interface provided by eachline
and Iterators.filter
.[1:end-1]
creates an unnecessary vector. Use view
or the convenient @view
macro to avoid an allocation.In addition, I think it's clearer in this code to stick to either map
or array comprehension instead of mixing the two.
The following code incorporates these changes (using map
with do
notation). In my test case, this improves speed by about 15% and memory use by about 30%:
results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
map(@view split(line, ' ')[1:end-1]) do s
parse(Float64, s)
end
end
If you prefer array comprehension over map
, the following is identical:
results = [
[
parse(Float64, s)
for s in @view split(line, ' ')[1:end-1]
]
for line in Iterators.filter(!startswith('#'), eachline(filename))
]
As you point out in the comments, we can use broadcasting to eliminate the explicit inner loop, leading to this cleaner code:
results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
parse.(Float64, @view split(line, ' ')[1:end-1])
end
results = [
parse.(Float64, @view split(line, ' ')[1:end-1])
for line in Iterators.filter(!startswith('#'), eachline(filename))
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With