Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia anonymous functions and performance

I am porting this Python code...

with open(filename, 'r') as f:
    results = [np.array(line.strip().split(' ')[:-1], float)
               for line in filter(lambda l: l[0] != '#', f.readlines())]

...to Julia. I came up with:

results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1])
        for s in filter(s -> s[1] !== '#', readlines(filename))];

The main reason for this porting is a potential performance gain, so I timed the two snippets in a Jupyter notebook:

  • using %%timeit...
    • Python: 12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    • Julia: @benchmark returns (among other things) mean time: 8.250 ms (2.62% GC). So far so good; I do get a performance boost.
  • However, when using @time:
    • I get something in the lines of 0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time). From this thread I inferred that it was probably caused by my -> function declarations.

Indeed, if I replace my code with:

filt = s -> s[1] !== '#';
pars = ss -> parse(Float64, ss);
res = [map(pars, split(s, " ")[1:end-1])
        for s in filter(filt, readlines(filename))];

and time only the last line, I get a more encouraging 0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time); hurray! However, it kind of defeats the purpose (at least as I understand it) of anonymous functions and might lead to a bunch of f1, f2, f3, ... Giving a name to my Python lambda function out of the list comprehension does not seem to affect the Python's runtime.

My question is: to get normal performances, should I systematically name my Julia functions? Note that this particular snippet is to be called in a loop over ~30k files. (Basically, what I am doing is reading files that are mixtures of space-separated floats and comment lines; each float line can have a different length, and I am not interested in the last element of the line. Any comments on my solutions are appreciated.)

(Side comment: wrapping s with strip completely messes up @benchmark, adding 10 ms to the mean, but does not seem to affect @time. Any reason why?)


Putting everything in a function as suggested by DNF fixes my "have to name my anonymous functions" problem. Using one of Vincent Yu's formulations:

function results(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     7.108 ms (0.00% GC)
  median time:      7.458 ms (0.00% GC)
  mean time:        7.580 ms (1.58% GC)
  maximum time:     9.538 ms (14.84% GC)
  --------------
  samples:          659
  evals/sample:     1

@time called on this function returns equivalent results after the first compilation run. I am happy with that.

However, this is my persisting issue with strip:

function results_strip(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     15.155 ms (0.00% GC)
  median time:      15.742 ms (0.00% GC)
  mean time:        15.885 ms (0.75% GC)
  maximum time:     19.089 ms (10.02% GC)
  --------------
  samples:          315
  evals/sample:     1

The median time doubles. If I look at strip only:

function only_strip(filename::String)
    [strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark only_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  1.11 MiB
  allocs estimate:  475
  --------------
  minimum time:     223.868 μs (0.00% GC)
  median time:      258.227 μs (0.00% GC)
  mean time:        325.389 μs (9.41% GC)
  maximum time:     56.024 ms (75.09% GC)
  --------------
  samples:          10000
  evals/sample:     1

Figures just do not add up. Could there be a type mismatch, should I cast the results to something else?

like image 200
Aubergine Avatar asked Jun 24 '21 02:06

Aubergine


People also ask

Are anonymous functions slow?

Anonymous objects are faster than named objects. But calling more functions is more expensive, and to a degree which eclipses any savings you might get from using anonymous functions. Each function called adds to the call stack, which introduces a small but non-trivial amount of overhead.

What are anonymous functions good for?

The advantage of an anonymous function is that it does not have to be stored in a separate file. This can greatly simplify programs, as often calculations are very simple and the use of anonymous functions reduces the number of code files necessary for a program.


2 Answers

In order to (hopefully) clearly summarize what Colin T Bowers and DNF commented:

  1. In Julia anonymous functions are as fast as named functions after compilation.
  2. The difference you are observing is caused by compilation time.
  3. When you use BenchmarkTools.jl (@btime) the time is always measured after compilation. If you just use @time the computed time includes compilation. Actually you get this information in the output (where you got % of time spent in compilation).
  4. Similarly if you put the whole expression in the function it would get compiled only once, while if you evaluate it in top-level scope it gets compiled every time you run it.

The conclusion is:

  1. If your code were really computationally expensive compilation time would not matter (as it is one-time constant cost), so you do not have to worry about it if you have a large computation to perform.
  2. However, if your computation is cheap the compilation time will be noticeable as then every time you introduce a new anonymous function in top-level scope it has to be compiled.

Let me give you a canonical example showing the problem that will hopefully help you understand the issue better:

julia> x = rand(10^6);

julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled
  0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time)
499921

julia> f(x) = x < 0.5
f (generic function with 1 method)

julia> @time count(f, x) # f is a new function - it has to be compiled
  0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000596 seconds (2 allocations: 32 bytes)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000621 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled
  0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000504 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000616 seconds (2 allocations: 32 bytes)

The point is that v -> v > 0.5 is a new function every time you write it, even if you used exactly the same definition - Julia has to create a new anonymous function if you introduce it in global scope. It can be easily seen here:

julia> v -> v > 0.5
#7 (generic function with 1 method)

julia> v -> v > 0.5
#9 (generic function with 1 method)

(note that the number increases - it is a different function)

Now have a look at >(0.5):

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

It is the same callable every time - so it has to be compiled only once.

Finally if you wrap things in a function, as DNF explained you have:

julia> test() = v -> v > 0.5
test (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

And as you can see as anonymous function is defined within a named function the compiler knows it is the same anonymous function every time, so the number does not increase (it had to be compiled only once - the first time test got called).


Regarding the strip issue. The difference is visible in @btime but not with @time as the cost of strip in @time is dwarfed by the cost of compilation so you are simply unable to see the difference then, but it is actually the same in both cases.

like image 78
Bogumił Kamiński Avatar answered Sep 21 '22 23:09

Bogumił Kamiński


Bogumił Kamiński's answer is excellent. I'm just writing this to comment on your solution.

Note that you can use the DelimitedFiles module from the standard library to read such files into matrices. Like this:

using DelimitedFiles
readdlm(filename, ' ', Float64; comments=true, comment_char='#')

But you may find this slower than your code because it reads data into a column-major matrix instead of a row-based vector of vectors. Which one is better depends on your needs. (And of course, there are many packages that can read delimited files into various structures.)

Regarding your solution, I suggest a few small changes that can improve performance and memory use:

  • Both readlines and filter allocate new vectors that you don't keep. To avoid these memory allocations, use the iterator interface provided by eachline and Iterators.filter.
  • Similarly, indexing [1:end-1] creates an unnecessary vector. Use view or the convenient @view macro to avoid an allocation.

In addition, I think it's clearer in this code to stick to either map or array comprehension instead of mixing the two.

The following code incorporates these changes (using map with do notation). In my test case, this improves speed by about 15% and memory use by about 30%:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    map(@view split(line, ' ')[1:end-1]) do s
        parse(Float64, s)
    end
end

If you prefer array comprehension over map, the following is identical:

results = [
    [
        parse(Float64, s)
        for s in @view split(line, ' ')[1:end-1]
    ]
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]

As you point out in the comments, we can use broadcasting to eliminate the explicit inner loop, leading to this cleaner code:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    parse.(Float64, @view split(line, ' ')[1:end-1])
end

results = [
    parse.(Float64, @view split(line, ' ')[1:end-1])
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]
like image 39
Vincent Yu Avatar answered Sep 18 '22 23:09

Vincent Yu