I am porting this Python code... <pre class="prettyprint"><code>with open(filename, 'r') as f: results = [np.array(line.strip().split(' ')[:-1], float) for line in filter(lambda l: l[0] != '#', f.readlines())] </code></pre> ...to Julia. I came up with: <pre class="prettyprint"><code>results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1]) for s in filter(s -> s[1] !== '#', readlines(filename))]; </code></pre> The main reason for this porting is a potential performance gain, so I timed the two snippets in a Jupyter notebook: <ul> <li>using <code>%%timeit</code>... <ul> <li>Python: <code>12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)</code> </li> <li>Julia: <code>@benchmark</code> returns (among other things) <code>mean time: 8.250 ms (2.62% GC)</code>. So far so good; I do get a performance boost.</li> </ul> </li> <li>However, when using <code>@time</code>: <ul> <li>I get something in the lines of <code>0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time)</code>. From this thread I inferred that it was probably caused by my -> function declarations.</li> </ul> </li> </ul> Indeed, if I replace my code with: <pre class="prettyprint"><code>filt = s -> s[1] !== '#'; pars = ss -> parse(Float64, ss); res = [map(pars, split(s, " ")[1:end-1]) for s in filter(filt, readlines(filename))]; </code></pre> and time only the last line, I get a more encouraging <code>0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time)</code>; hurray! However, it kind of defeats the purpose (at least as I understand it) of anonymous functions and might lead to a bunch of f1, f2, f3, ... Giving a name to my Python lambda function out of the list comprehension does not seem to affect the Python's runtime. My question is: to get normal performances, should I systematically name my Julia functions? Note that this particular snippet is to be called in a loop over ~30k files. (Basically, what I am doing is reading files that are mixtures of space-separated floats and comment lines; each float line can have a different length, and I am not interested in the last element of the line. Any comments on my solutions are appreciated.) (Side comment: wrapping <code>s</code> with <code>strip</code> completely messes up <code>@benchmark</code>, adding 10 ms to the mean, but does not seem to affect <code>@time</code>. Any reason why?) <hr> Putting everything in a function as suggested by DNF fixes my "have to name my anonymous functions" problem. Using one of Vincent Yu's formulations: <pre class="prettyprint"><code>function results(filename::String)::Vector{Vector{Float64}} [[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]] for line in Iterators.filter(!startswith('#'), eachline(filename))] end @benchmark results(FN) BenchmarkTools.Trial: memory estimate: 3.74 MiB allocs estimate: 1465 -------------- minimum time: 7.108 ms (0.00% GC) median time: 7.458 ms (0.00% GC) mean time: 7.580 ms (1.58% GC) maximum time: 9.538 ms (14.84% GC) -------------- samples: 659 evals/sample: 1 </code></pre> @time called on this function returns equivalent results after the first compilation run. I am happy with that. However, this is my persisting issue with strip: <pre class="prettyprint"><code>function results_strip(filename::String)::Vector{Vector{Float64}} [[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]] for line in Iterators.filter(!startswith('#'), eachline(filename))] end @benchmark results_strip(FN) BenchmarkTools.Trial: memory estimate: 3.74 MiB allocs estimate: 1465 -------------- minimum time: 15.155 ms (0.00% GC) median time: 15.742 ms (0.00% GC) mean time: 15.885 ms (0.75% GC) maximum time: 19.089 ms (10.02% GC) -------------- samples: 315 evals/sample: 1 </code></pre> The median time doubles. If I look at strip only: <pre class="prettyprint"><code>function only_strip(filename::String) [strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))] end @benchmark only_strip(FN) BenchmarkTools.Trial: memory estimate: 1.11 MiB allocs estimate: 475 -------------- minimum time: 223.868 μs (0.00% GC) median time: 258.227 μs (0.00% GC) mean time: 325.389 μs (9.41% GC) maximum time: 56.024 ms (75.09% GC) -------------- samples: 10000 evals/sample: 1 </code></pre> Figures just do not add up. Could there be a type mismatch, should I cast the results to something else?

In order to (hopefully) clearly summarize what Colin T Bowers and DNF commented: <ol> <li>In Julia anonymous functions are as fast as named functions after compilation.</li> <li>The difference you are observing is caused by compilation time.</li> <li>When you use BenchmarkTools.jl (<code>@btime</code>) the time is always measured after compilation. If you just use <code>@time</code> the computed time includes compilation. Actually you get this information in the output (where you got % of time spent in compilation).</li> <li>Similarly if you put the whole expression in the function it would get compiled only once, while if you evaluate it in top-level scope it gets compiled every time you run it.</li> </ol> The conclusion is: <ol> <li>If your code were really computationally expensive compilation time would not matter (as it is one-time constant cost), so you do not have to worry about it if you have a large computation to perform.</li> <li>However, if your computation is cheap the compilation time will be noticeable as then every time you introduce a new anonymous function in top-level scope it has to be compiled.</li> </ol> <hr> Let me give you a canonical example showing the problem that will hopefully help you understand the issue better: <pre class="prettyprint"><code>julia> x = rand(10^6); julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled 0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time) 499921 julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled 0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time) 499921 julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled 0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time) 499921 julia> f(x) = x < 0.5 f (generic function with 1 method) julia> @time count(f, x) # f is a new function - it has to be compiled 0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time) 499921 julia> @time count(f, x) # f was already compiled - we are fast 0.000596 seconds (2 allocations: 32 bytes) 499921 julia> @time count(f, x) # f was already compiled - we are fast 0.000621 seconds (2 allocations: 32 bytes) 499921 julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled 0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time) 499921 julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast 0.000504 seconds (2 allocations: 32 bytes) 499921 julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast 0.000616 seconds (2 allocations: 32 bytes) </code></pre> The point is that <code>v -> v > 0.5</code> is a new function every time you write it, even if you used exactly the same definition - Julia has to create a new anonymous function if you introduce it in global scope. It can be easily seen here: <pre class="prettyprint"><code>julia> v -> v > 0.5 #7 (generic function with 1 method) julia> v -> v > 0.5 #9 (generic function with 1 method) </code></pre> (note that the number increases - it is a different function) Now have a look at <code>>(0.5)</code>: <pre class="prettyprint"><code>julia> >(0.5) (::Base.Fix2{typeof(>), Float64}) (generic function with 1 method) julia> >(0.5) (::Base.Fix2{typeof(>), Float64}) (generic function with 1 method) </code></pre> It is the same callable every time - so it has to be compiled only once. Finally if you wrap things in a function, as DNF explained you have: <pre class="prettyprint"><code>julia> test() = v -> v > 0.5 test (generic function with 1 method) julia> test() #11 (generic function with 1 method) julia> test() #11 (generic function with 1 method) </code></pre> And as you can see as anonymous function is defined within a named function the compiler knows it is the same anonymous function every time, so the number does not increase (it had to be compiled only once - the first time <code>test</code> got called). <hr> Regarding the <code>strip</code> issue. The difference is visible in <code>@btime</code> but not with <code>@time</code> as the cost of <code>strip</code> in <code>@time</code> is dwarfed by the cost of compilation so you are simply unable to see the difference then, but it is actually the same in both cases.

Bogumił Kamiński's answer is excellent. I'm just writing this to comment on your solution. Note that you can use the <code>DelimitedFiles</code> module from the standard library to read such files into matrices. Like this: <pre class="prettyprint"><code>using DelimitedFiles readdlm(filename, ' ', Float64; comments=true, comment_char='#') </code></pre> But you may find this slower than your code because it reads data into a column-major matrix instead of a row-based vector of vectors. Which one is better depends on your needs. (And of course, there are many packages that can read delimited files into various structures.) Regarding your solution, I suggest a few small changes that can improve performance and memory use: <ul> <li>Both <code>readlines</code> and <code>filter</code> allocate new vectors that you don't keep. To avoid these memory allocations, use the iterator interface provided by <code>eachline</code> and <code>Iterators.filter</code>.</li> <li>Similarly, indexing <code>[1:end-1]</code> creates an unnecessary vector. Use <code>view</code> or the convenient <code>@view</code> macro to avoid an allocation.</li> </ul> In addition, I think it's clearer in this code to stick to either <code>map</code> or array comprehension instead of mixing the two. The following code incorporates these changes (using <code>map</code> with <code>do</code> notation). In my test case, this improves speed by about 15% and memory use by about 30%: <pre class="prettyprint"><code>results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line map(@view split(line, ' ')[1:end-1]) do s parse(Float64, s) end end </code></pre> If you prefer array comprehension over <code>map</code>, the following is identical: <pre class="prettyprint"><code>results = [ [ parse(Float64, s) for s in @view split(line, ' ')[1:end-1] ] for line in Iterators.filter(!startswith('#'), eachline(filename)) ] </code></pre> As you point out in the comments, we can use broadcasting to eliminate the explicit inner loop, leading to this cleaner code: <pre class="prettyprint"><code>results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line parse.(Float64, @view split(line, ' ')[1:end-1]) end results = [ parse.(Float64, @view split(line, ' ')[1:end-1]) for line in Iterators.filter(!startswith('#'), eachline(filename)) ] </code></pre>

Julia anonymous functions and performance

Tags:

performance

python

anonymous-function

julia

I am porting this Python code...

with open(filename, 'r') as f:
    results = [np.array(line.strip().split(' ')[:-1], float)
               for line in filter(lambda l: l[0] != '#', f.readlines())]

...to Julia. I came up with:

results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1])
        for s in filter(s -> s[1] !== '#', readlines(filename))];

The main reason for this porting is a potential performance gain, so I timed the two snippets in a Jupyter notebook:

using %%timeit...
- Python: 12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
- Julia: @benchmark returns (among other things) mean time: 8.250 ms (2.62% GC). So far so good; I do get a performance boost.
However, when using @time:
- I get something in the lines of 0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time). From this thread I inferred that it was probably caused by my -> function declarations.

Indeed, if I replace my code with:

filt = s -> s[1] !== '#';
pars = ss -> parse(Float64, ss);
res = [map(pars, split(s, " ")[1:end-1])
        for s in filter(filt, readlines(filename))];

and time only the last line, I get a more encouraging 0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time); hurray! However, it kind of defeats the purpose (at least as I understand it) of anonymous functions and might lead to a bunch of f1, f2, f3, ... Giving a name to my Python lambda function out of the list comprehension does not seem to affect the Python's runtime.

My question is: to get normal performances, should I systematically name my Julia functions? Note that this particular snippet is to be called in a loop over ~30k files. (Basically, what I am doing is reading files that are mixtures of space-separated floats and comment lines; each float line can have a different length, and I am not interested in the last element of the line. Any comments on my solutions are appreciated.)

(Side comment: wrapping s with strip completely messes up @benchmark, adding 10 ms to the mean, but does not seem to affect @time. Any reason why?)

Putting everything in a function as suggested by DNF fixes my "have to name my anonymous functions" problem. Using one of Vincent Yu's formulations:

function results(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     7.108 ms (0.00% GC)
  median time:      7.458 ms (0.00% GC)
  mean time:        7.580 ms (1.58% GC)
  maximum time:     9.538 ms (14.84% GC)
  --------------
  samples:          659
  evals/sample:     1

@time called on this function returns equivalent results after the first compilation run. I am happy with that.

However, this is my persisting issue with strip:

function results_strip(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     15.155 ms (0.00% GC)
  median time:      15.742 ms (0.00% GC)
  mean time:        15.885 ms (0.75% GC)
  maximum time:     19.089 ms (10.02% GC)
  --------------
  samples:          315
  evals/sample:     1

The median time doubles. If I look at strip only:

function only_strip(filename::String)
    [strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark only_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  1.11 MiB
  allocs estimate:  475
  --------------
  minimum time:     223.868 μs (0.00% GC)
  median time:      258.227 μs (0.00% GC)
  mean time:        325.389 μs (9.41% GC)
  maximum time:     56.024 ms (75.09% GC)
  --------------
  samples:          10000
  evals/sample:     1

Figures just do not add up. Could there be a type mismatch, should I cast the results to something else?

200

asked Jun 24 '21 02:06

Aubergine

2 Answers

In order to (hopefully) clearly summarize what Colin T Bowers and DNF commented:

In Julia anonymous functions are as fast as named functions after compilation.
The difference you are observing is caused by compilation time.
When you use BenchmarkTools.jl (@btime) the time is always measured after compilation. If you just use @time the computed time includes compilation. Actually you get this information in the output (where you got % of time spent in compilation).
Similarly if you put the whole expression in the function it would get compiled only once, while if you evaluate it in top-level scope it gets compiled every time you run it.

The conclusion is:

If your code were really computationally expensive compilation time would not matter (as it is one-time constant cost), so you do not have to worry about it if you have a large computation to perform.
However, if your computation is cheap the compilation time will be noticeable as then every time you introduce a new anonymous function in top-level scope it has to be compiled.

Let me give you a canonical example showing the problem that will hopefully help you understand the issue better:

julia> x = rand(10^6);

julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled
  0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time)
499921

julia> f(x) = x < 0.5
f (generic function with 1 method)

julia> @time count(f, x) # f is a new function - it has to be compiled
  0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000596 seconds (2 allocations: 32 bytes)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000621 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled
  0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000504 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000616 seconds (2 allocations: 32 bytes)

The point is that v -> v > 0.5 is a new function every time you write it, even if you used exactly the same definition - Julia has to create a new anonymous function if you introduce it in global scope. It can be easily seen here:

julia> v -> v > 0.5
#7 (generic function with 1 method)

julia> v -> v > 0.5
#9 (generic function with 1 method)

(note that the number increases - it is a different function)

Now have a look at >(0.5):

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

It is the same callable every time - so it has to be compiled only once.

Finally if you wrap things in a function, as DNF explained you have:

julia> test() = v -> v > 0.5
test (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

And as you can see as anonymous function is defined within a named function the compiler knows it is the same anonymous function every time, so the number does not increase (it had to be compiled only once - the first time test got called).

Regarding the strip issue. The difference is visible in @btime but not with @time as the cost of strip in @time is dwarfed by the cost of compilation so you are simply unable to see the difference then, but it is actually the same in both cases.

answered Sep 21 '22 23:09

Bogumił Kamiński

Bogumił Kamiński's answer is excellent. I'm just writing this to comment on your solution.

Note that you can use the DelimitedFiles module from the standard library to read such files into matrices. Like this:

using DelimitedFiles
readdlm(filename, ' ', Float64; comments=true, comment_char='#')

But you may find this slower than your code because it reads data into a column-major matrix instead of a row-based vector of vectors. Which one is better depends on your needs. (And of course, there are many packages that can read delimited files into various structures.)

Regarding your solution, I suggest a few small changes that can improve performance and memory use:

Both readlines and filter allocate new vectors that you don't keep. To avoid these memory allocations, use the iterator interface provided by eachline and Iterators.filter.
Similarly, indexing [1:end-1] creates an unnecessary vector. Use view or the convenient @view macro to avoid an allocation.

In addition, I think it's clearer in this code to stick to either map or array comprehension instead of mixing the two.

The following code incorporates these changes (using map with do notation). In my test case, this improves speed by about 15% and memory use by about 30%:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    map(@view split(line, ' ')[1:end-1]) do s
        parse(Float64, s)
    end
end

If you prefer array comprehension over map, the following is identical:

results = [
    [
        parse(Float64, s)
        for s in @view split(line, ' ')[1:end-1]
    ]
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]

As you point out in the comments, we can use broadcasting to eliminate the explicit inner loop, leading to this cleaner code:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    parse.(Float64, @view split(line, ' ')[1:end-1])
end

results = [
    parse.(Float64, @view split(line, ' ')[1:end-1])
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]

answered Sep 18 '22 23:09

Vincent Yu

Related questions
                            
                                Python: How to replace tqdm progress bar by next one in nested loop?
                            
                                Find the difference between strings for each two rows of pandas data.frame
                            
                                Optimized way for checking if the given number is a Palindrome
                            
                                Split list into N sublists with approximately equal sums
                            
                                Efficient algorithm for getting number of partitions of integer with distinct parts (Partition function Q)
                            
                                Python-poetry error: Setting settings.virtualenvs.in-project does not exist
                            
                                Is there a plugin similar to gitlens for pycharm or other products?
                            
                                How to use python to schedule tasks in a Django application
                            
                                Discord Bot - "Attribute Error: 'NoneType' object has no attribute 'strip.'
                            
                                Import librosa gives "no module named numba.decorators", how to solve?
                            
                                Discord.py - Make a bot react to its own message(s)
                            
                                Multiprocessing multithreading GIL?
                            
                                Jupyter notebook does not launch (ImportError: DLL load failed while importing)
                            
                                hiding secret key in django project on github after uploading project
                            
                                How to implement learning to rank using lightgbm?
                            
                                tensorflow installation in python 3.9.0(64 bit) and pip version 20.2.3 .?
                            
                                How to set the hue order in Seaborn plots
                            
                                Print out n elements of a list each time a function is run
                            
                                OpenCV - find blackboard edges on video and images
                            
                                Tflite detext error: cv2.error: OpenCV(4.5.2) :-1: error: (-5:Bad argument) in function 'rectangle'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With