I ran <code>julia --track-allocation prof.jl</code> resulting in the following output: <pre class="prettyprint"><code> - using FixedSizeArrays - - immutable KernelVals{T} - wavenumber::T - vect::Vec{3,T} - dist::T - green::Complex{T} - gradgreen::Vec{3,Complex{T}} - end - - function kernelvals(k, x, y) - r = x - y 0 R2 = r[1]*r[1] 0 R2 += r[2]*r[2] 0 R2 += r[3]*r[3] 0 R = sqrt(R2) - 0 γ = im*k 0 expn = exp(-γ * R) 0 fctr = 1.0 / (4.0*pi*R) 0 green = fctr * expn 64 gradgreen = -(γ + 1/R) * green / R * r - 0 KernelVals(k, r, R, green, gradgreen) - end - - function payload() - x = Vec{3,Float64}(0.47046262275611883,0.8745228524771103,-0.049820876498487966) 0 y = Vec{3,Float64}(-0.08977259509004082,0.543199687600189,0.8291184043296924) 0 k = 1.0 0 kv = kernelvals(k,x,y) - return kv - end - - function driver() - println("Flush result: ", payload()) 0 Profile.clear_malloc_data() 0 payload() - end - - driver() </code></pre> I cannot get rid of the final memory allocation on the line starting <code>gradgreen...</code>. I ran <code>@code_warntype kernelsvals(...)</code>, revealing no type instability or uncertainty. The allocation pattern is identical on <code>julia-0.4.6</code> and <code>julia-0.5.0-pre</code>. This function will be the inner kernel in a boundary element method I am implementing. It will be called literally millions of times, resulting in a gross memory allocation that can grow to be a multiple of the physical memory available to me. The reason I am using <code>FixedSizeArrays</code> is to avoid allocations related to the creation of small <code>Array</code>s. The precise location where the allocation is reported depends in a very sensitive manner on the code. At some point the memory profiler was blaming <code>1/(4*pi*R)</code> as the line triggering allocation. Any help or general tips on how to write code resulting in predictable allocation patterns is highly appreciated.

After some experiments I finally managed to get rid of all allocations. The culprit turned out to be the promotion architecture as extended in <code>FixedSizeArrays</code>. Apparently multiplying a complex scalar and a real vector creates a temporary along the way. Replacing the definition of <code>gradgreen</code> with <pre class="prettyprint"><code>c = -(γ + 1/R) * green / R gradgreen = Vec(c*r[1], c*r[2], c*r[3]) </code></pre> results in allocation-free runs. In my benchmark example execution time came down from 6.5 seconds to 4.15 seconds. Total allocation size from 4.5 GB to 1.4 GB. EDT: Reported this issue to <code>FixedSizeArrays</code> developers, who fixed it immediately (thank you!). Allocations disappeared completely.

Optimising away residual heap allocation in Julia

Tags:

optimization

profiling

allocation

julia

I ran julia --track-allocation prof.jl resulting in the following output:

    - using FixedSizeArrays
    - 
    - immutable KernelVals{T}
    -     wavenumber::T
    -     vect::Vec{3,T}
    -     dist::T
    -     green::Complex{T}
    -     gradgreen::Vec{3,Complex{T}}
    - end
    - 
    - function kernelvals(k, x, y)
    -     r = x - y
    0     R2 =  r[1]*r[1]
    0     R2 += r[2]*r[2]
    0     R2 += r[3]*r[3]
    0     R = sqrt(R2)
    - 
    0     γ = im*k
    0     expn = exp(-γ * R)
    0     fctr = 1.0 / (4.0*pi*R)
    0     green = fctr * expn
   64     gradgreen = -(γ + 1/R) * green / R * r
    - 
    0     KernelVals(k, r, R, green, gradgreen)
    - end
    - 
    - function payload()
    -   x = Vec{3,Float64}(0.47046262275611883,0.8745228524771103,-0.049820876498487966)
    0   y = Vec{3,Float64}(-0.08977259509004082,0.543199687600189,0.8291184043296924)
    0   k = 1.0
    0   kv = kernelvals(k,x,y)
    -   return kv
    - end
    - 
    - function driver()
    -   println("Flush result: ", payload())
    0   Profile.clear_malloc_data()
    0   payload()
    - end
    - 
    - driver()

I cannot get rid of the final memory allocation on the line starting gradgreen.... I ran @code_warntype kernelsvals(...), revealing no type instability or uncertainty.

The allocation pattern is identical on julia-0.4.6 and julia-0.5.0-pre.

This function will be the inner kernel in a boundary element method I am implementing. It will be called literally millions of times, resulting in a gross memory allocation that can grow to be a multiple of the physical memory available to me.

The reason I am using FixedSizeArrays is to avoid allocations related to the creation of small Arrays.

The precise location where the allocation is reported depends in a very sensitive manner on the code. At some point the memory profiler was blaming 1/(4*pi*R) as the line triggering allocation.

Any help or general tips on how to write code resulting in predictable allocation patterns is highly appreciated.

534

asked Jul 24 '16 14:07

krcools

1 Answers

After some experiments I finally managed to get rid of all allocations. The culprit turned out to be the promotion architecture as extended in FixedSizeArrays. Apparently multiplying a complex scalar and a real vector creates a temporary along the way.

Replacing the definition of gradgreen with

c = -(γ + 1/R) * green / R
gradgreen = Vec(c*r[1], c*r[2], c*r[3])

results in allocation-free runs. In my benchmark example execution time came down from 6.5 seconds to 4.15 seconds. Total allocation size from 4.5 GB to 1.4 GB.

EDT: Reported this issue to FixedSizeArrays developers, who fixed it immediately (thank you!). Allocations disappeared completely.

122

answered Oct 23 '22 13:10

krcools

Related questions
                            
                                Tips/Tricks for optimizing performance of views in Rails (2.x or 3.x)?
                            
                                MySQL 5.1 using filesort event when an index is present
                            
                                Interpreting gprof output with <spontaneous>
                            
                                Algorithm - find the minimal subtraction between sum of two arrays
                            
                                Best practices to structure a database to be scaling-ready
                            
                                c99 __restrict and compiler optimization
                            
                                Restrict pointers and inlining
                            
                                Should an inline function be defined before it is called?
                            
                                How safe/mature is the simulated annealing algorithm given in Numerical Recipes?
                            
                                Optimize a function of a function in r
                            
                                JVM JIT diagnostic tools and optimization tips
                            
                                Why is ToUpperInvariant() faster than ToLowerInvariant()?
                            
                                Algorithm optimization to find possible aminoacid sequences with total mass m [duplicate]
                            
                                R optim same function for fn and gr
                            
                                Interpretation of perf stat output
                            
                                Optimal Method of Checking Keypresses on TI-89
                            
                                When would the compiler be conservative regarding pointer dereferencing optimization, if at all?
                            
                                How to reduce boot time in embedded android os.?
                            
                                Does MySQL minimise duplicate VARCHAR storage automagically?
                            
                                How to test generic performance with whole module optimization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimising away residual heap allocation in Julia

Tags:

optimization

profiling

allocation

julia

krcools

People also ask

1 Answers

krcools

Recent Activity

Donate For Us