I ran julia --track-allocation prof.jl
resulting in the following output:
- using FixedSizeArrays
-
- immutable KernelVals{T}
- wavenumber::T
- vect::Vec{3,T}
- dist::T
- green::Complex{T}
- gradgreen::Vec{3,Complex{T}}
- end
-
- function kernelvals(k, x, y)
- r = x - y
0 R2 = r[1]*r[1]
0 R2 += r[2]*r[2]
0 R2 += r[3]*r[3]
0 R = sqrt(R2)
-
0 γ = im*k
0 expn = exp(-γ * R)
0 fctr = 1.0 / (4.0*pi*R)
0 green = fctr * expn
64 gradgreen = -(γ + 1/R) * green / R * r
-
0 KernelVals(k, r, R, green, gradgreen)
- end
-
- function payload()
- x = Vec{3,Float64}(0.47046262275611883,0.8745228524771103,-0.049820876498487966)
0 y = Vec{3,Float64}(-0.08977259509004082,0.543199687600189,0.8291184043296924)
0 k = 1.0
0 kv = kernelvals(k,x,y)
- return kv
- end
-
- function driver()
- println("Flush result: ", payload())
0 Profile.clear_malloc_data()
0 payload()
- end
-
- driver()
I cannot get rid of the final memory allocation on the line starting gradgreen...
. I ran @code_warntype kernelsvals(...)
, revealing no type instability or uncertainty.
The allocation pattern is identical on julia-0.4.6
and julia-0.5.0-pre
.
This function will be the inner kernel in a boundary element method I am implementing. It will be called literally millions of times, resulting in a gross memory allocation that can grow to be a multiple of the physical memory available to me.
The reason I am using FixedSizeArrays
is to avoid allocations related to the creation of small Array
s.
The precise location where the allocation is reported depends in a very sensitive manner on the code. At some point the memory profiler was blaming 1/(4*pi*R)
as the line triggering allocation.
Any help or general tips on how to write code resulting in predictable allocation patterns is highly appreciated.
First, overload the “new” operator in the class. So that, when we create object using “new”, it will not request memory from operating system but will call overloaded “new” method from the class itself for memory allocation. Make the overload new method private to make it in-accessible from outside of the class.
Because the heap is a far more complicated data structure than the stack. For many architectures, allocating memory on the stack is just a matter of changing the stack pointer, i.e. it's one instruction.
When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.
Heap allocation is the most flexible allocation scheme. Allocation and deallocation of memory can be done at any time and any place depending upon the user's requirement. Heap allocation is used to allocate memory to the variables dynamically and when the variables are no more used then claim it back.
After some experiments I finally managed to get rid of all allocations. The culprit turned out to be the promotion architecture as extended in FixedSizeArrays
. Apparently multiplying a complex scalar and a real vector creates a temporary along the way.
Replacing the definition of gradgreen
with
c = -(γ + 1/R) * green / R
gradgreen = Vec(c*r[1], c*r[2], c*r[3])
results in allocation-free runs. In my benchmark example execution time came down from 6.5 seconds to 4.15 seconds. Total allocation size from 4.5 GB to 1.4 GB.
EDT: Reported this issue to FixedSizeArrays
developers, who fixed it immediately (thank you!). Allocations disappeared completely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With