Is there a way in Julia to generalise a pattern like the following? <pre class="prettyprint"><code>function compute_sum(xs::Vector{Float64}) res = 0 for i in 1:length(xs) res += sqrt(xs[i]) end res end </code></pre> This computes the square-root of each vector element and then sums everything. It is much faster than the "naive" versions with array comprehension or <code>map</code>, and also doesn't allocate additional memory: <pre class="prettyprint"><code>xs = rand(1000) julia> @time compute_sum(xs) 0.000004 seconds 676.8372556762225 julia> @time sum([sqrt(x) for x in xs]) 0.000013 seconds (3 allocations: 7.969 KiB) 676.837255676223 julia> @time sum(map(sqrt, xs)) 0.000013 seconds (3 allocations: 7.969 KiB) 676.837255676223 </code></pre> Unfortunately the "obvious" generic version is terrible wrt performance: <pre class="prettyprint"><code>function compute_sum2(xs::Vector{Float64}, fn::Function) res = 0 for i in 1:length(xs) res += fn(xs[i]) end res end julia> @time compute_sum2(xs, x -> sqrt(x)) 0.013537 seconds (19.34 k allocations: 1.011 MiB) 676.8372556762225 </code></pre>

The reason is that <code>x -> sqrt(x)</code> is defined as a new anonymous function with each call to <code>compute_sum2</code>, so this causes new compilation every time you call it. If you define it before even e.g. like this: <pre class="prettyprint"><code>julia> f = x -> sqrt(x) </code></pre> then you have: <pre class="prettyprint"><code>julia> @time compute_sum2(xs, f) # here you pay compilation cost 0.010053 seconds (19.46 k allocations: 1.064 MiB) 665.2469135020949 julia> @time compute_sum2(xs, f) # here you have already compiled everything 0.000003 seconds (1 allocation: 16 bytes) 665.2469135020949 </code></pre> Note that a natural approach would be to define a function with a name like this: <pre class="prettyprint"><code>julia> g(x) = sqrt(x) g (generic function with 1 method) julia> @time compute_sum2(xs, g) 0.000002 seconds 665.2469135020949 </code></pre> You can see that <code>x -> sqrt(x)</code> defines a fresh anonymous function each time it is encountered when you write e.g.: <pre class="prettyprint"><code>julia> typeof(x -> sqrt(x)) var"#3#4" julia> typeof(x -> sqrt(x)) var"#5#6" julia> typeof(x -> sqrt(x)) var"#7#8" </code></pre> Note that this would be different if an anonymous function would be defined in a function body: <pre class="prettyprint"><code>julia> h() = typeof(x -> sqrt(x)) h (generic function with 2 methods) julia> h() var"#11#12" julia> h() var"#11#12" julia> h() var"#11#12" </code></pre> and you see that this time the anonymous function is the same every time.

In addition to the excellent response by Bogumil, I would just like to add that a very convenient way of generalizing this is to use the normal functional programming function like <code>map</code>, <code>reduce</code>, <code>fold</code>, etc. In this case, you're doing a <code>map</code> transformation (namely <code>sqrt</code>) and a reduce (namely <code>+</code>), so you can also achieve the result with <code>mapreduce(sqrt, +, xs)</code>. This has essentially no overhead and is comparable to a manual loop in performance. If you have a really complicated series of transformations, you can get optimal performance and still use a function using the Transducers.jl package.

Julia: function types and performance

Tags:

performance

julia

Is there a way in Julia to generalise a pattern like the following?

function compute_sum(xs::Vector{Float64})
    res = 0
    for i in 1:length(xs)
        res += sqrt(xs[i])
    end
    res
end

This computes the square-root of each vector element and then sums everything. It is much faster than the "naive" versions with array comprehension or map, and also doesn't allocate additional memory:

xs = rand(1000)

julia> @time compute_sum(xs)
  0.000004 seconds
676.8372556762225

julia> @time sum([sqrt(x) for x in xs])
  0.000013 seconds (3 allocations: 7.969 KiB)
676.837255676223

julia> @time sum(map(sqrt, xs))
  0.000013 seconds (3 allocations: 7.969 KiB)
676.837255676223

Unfortunately the "obvious" generic version is terrible wrt performance:

function compute_sum2(xs::Vector{Float64}, fn::Function)
    res = 0
    for i in 1:length(xs)
        res += fn(xs[i])
    end
    res
end

julia> @time compute_sum2(xs, x -> sqrt(x))
  0.013537 seconds (19.34 k allocations: 1.011 MiB)
676.8372556762225

455

asked Sep 28 '20 11:09

cno

2 Answers

The reason is that x -> sqrt(x) is defined as a new anonymous function with each call to compute_sum2, so this causes new compilation every time you call it.

If you define it before even e.g. like this:

julia> f = x -> sqrt(x)

then you have:

julia> @time compute_sum2(xs, f) # here you pay compilation cost
  0.010053 seconds (19.46 k allocations: 1.064 MiB)
665.2469135020949

julia> @time compute_sum2(xs, f) # here you have already compiled everything
  0.000003 seconds (1 allocation: 16 bytes)
665.2469135020949

Note that a natural approach would be to define a function with a name like this:

julia> g(x) = sqrt(x)
g (generic function with 1 method)

julia> @time compute_sum2(xs, g)
  0.000002 seconds
665.2469135020949

You can see that x -> sqrt(x) defines a fresh anonymous function each time it is encountered when you write e.g.:

julia> typeof(x -> sqrt(x))
var"#3#4"

julia> typeof(x -> sqrt(x))
var"#5#6"

julia> typeof(x -> sqrt(x))
var"#7#8"

Note that this would be different if an anonymous function would be defined in a function body:

julia> h() = typeof(x -> sqrt(x))
h (generic function with 2 methods)

julia> h()
var"#11#12"

julia> h()
var"#11#12"

julia> h()
var"#11#12"

and you see that this time the anonymous function is the same every time.

183

answered Sep 22 '22 11:09

Bogumił Kamiński

In addition to the excellent response by Bogumil, I would just like to add that a very convenient way of generalizing this is to use the normal functional programming function like map, reduce, fold, etc.

In this case, you're doing a map transformation (namely sqrt) and a reduce (namely +), so you can also achieve the result with mapreduce(sqrt, +, xs). This has essentially no overhead and is comparable to a manual loop in performance.

If you have a really complicated series of transformations, you can get optimal performance and still use a function using the Transducers.jl package.

answered Sep 21 '22 11:09

Jakob Nissen

Related questions
                            
                                Speed up python code for computing matrix cofactors
                            
                                performance implications of deep inheritance tree in c++
                            
                                Codeigniter batch insert performance
                            
                                C# Switch Statement: More efficient to not use default?
                            
                                Is there any ARM equivalent of Intel IPP?
                            
                                Why would my parallel code be slower than my serial code?
                            
                                Python: suggestion how to improve to write in streaming text file in Python
                            
                                Find all integer coordinates in a given radius
                            
                                max number of couchbase views per bucket
                            
                                Java Garbage Collection Time?
                            
                                Any way to use >1 Core in PostgreSQL for a single Connection/Query?
                            
                                LINQ vs foreach vs for performance test results
                            
                                Variable name length vs performance
                            
                                What really makes ReactJS as fast as it claims to be?
                            
                                tensorflow code optimization strategy
                            
                                Tensorflow GPU utilization only 60% (GTX 1070)
                            
                                Fast subtraction of two dataframes ignoring indices (Python)
                            
                                Why is bam from mgcv slow for some data?
                            
                                Python: any() unexpected performance
                            
                                Any faster way to check if lists in a list are equivalent?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With