Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia - nested loops are consuming a lot of memory

I have a nested loop iteration scheme that's taking up a lot of memory. I do understand that I should not have globals but even after I included everything in a function, the memory situation didn't improve. It just accumulates after each iteration as if there is no garbage collection.

Here is a workable example that's similar to my code.

I have two files. First, functions.jl:

##functions.jl
module functions

function getMatrix(A)
    L = rand(A,A);
    return L;
end

function loopOne(A, B)
    res = 0;
    for i = 1:B
        res = inv(getMatrix(A));
    end
    return(res);
end

end

Second file main.jl:

##main.jl
include("functions.jl")
function main(C)
    res = 0;
    A = 50;
    B = 30;
    for i =1:C
        mat = functions.loopOne(A,B);
        res = mat .+ 1;
    end
    return res;
end
main(100)

When I execute julia main.jl, the memory increases as I extend C in main(C) (sometimes to more than millions of allocations and 10GiBs when I increase C to 1000000).

I know that the example looks useless but it resembles the structure that I have. Can someone please help? Thank you.

UPDATE:

Michael K. Borregaard gave an answer which is very helpful:

module Functions                            #1

function loopOne!(res, mymatrix, B)         #2
    for i = 1:B
        res .= inv(rand!(mymatrix))         #3
end
    return res                              #4
end

end

function some_descriptive_name(C)           #5
    A, B = 50, 30                           #6
    res, mymat = zeros(A,A), zeros(A,A) 
    for i =1:C
        res .= Functions.loopOne!(res, mymat, B) .+ 1
    end
    return res

However, when I time it, allocations and memory still increase as I dial up C.

@time some_descriptive_name(30)
  0.057177 seconds (11.77 k allocations: 58.278 MiB, 9.58% gc time)

@time some_descriptive_name(60)
  0.113808 seconds (23.53 k allocations: 116.518 MiB, 9.63% gc time)

I believe that the problem comes from the inv function. If I change the code to:

function some_descriptive_name(C)           #5
    A, B = 50, 30                           #6
    res, mymat = zeros(A,A), zeros(A,A) 
    for i =1:C
       res .= res .+ 1
    end
    return res
end

The memory and allocations will then stay constant:

@time some_descriptive_name(3)
  0.000007 seconds (8 allocations: 39.438 KiB)

@time some_descriptive_name(60)
  0.000037 seconds (8 allocations: 39.438 KiB)

Is there a way to "clear" the memory after using inv? Since I'm not creating anything new or storing anything new, the memory usage should stay constant.

like image 558
tryingtosolve Avatar asked Dec 19 '22 04:12

tryingtosolve


1 Answers

A few pointers at least:

  1. The getMatrix function allocates a new AxA matrix every time. That will certainly consume memory. It is better to avoid the allocations if you can, e.g. by using rand! to fill an existing array with random values.

  2. The res = 0 line defines res as an Int but you subsequently assign a Matrix{Float} to it (the result of inv(getMatrix)). Changing the type of a variable in the code makes it hard for the compiler to figure out what the type is, which makes for slow code.

  3. It seems you have a module called functions but you don't write it.

  4. The res = inv code line constantly overwrites the value, so the loop does nothing!

  5. The structure and code looks like C++. Try looking at the style guide.

Here's how the code would look in a more ideomatic way that avoids allocations:

module Functions                            #1

function loopOne!(res, mymatrix, B)         #2
    for i = 1:B
        res .= inv(rand!(mymatrix))         #3
    end
    return res                              #4
end

end

function some_descriptive_name(C)           #5
    A, B = 50, 30                           #6
    res, mymat = zeros(A,A), zeros(A,A) 
    for i =1:C
        res .= Functions.loopOne!(res, mymat, B) .+ 1
    end
    return res
end

Comments:

  1. Use a module if you like - it's up to you whether to put things in different files. Module name in caps.

  2. If you can, it's an advantage to use functions that overwrite values of an existing container. Such functions end with ! to signal that they will modify the arguments (like passing a variably by reference without making it const in C++).

  3. Use the .= operator to indicate that you're not creating a new container, you're overwriting the elements of the existing one. The rand! function overwrites mymatrix.

  4. The return keyword is not strictly needed, but DNF suggested it is better style in a comment

  5. The main convention isn't used in Julia, as most code gets called by the user, not by execution of a program.

  6. Compact assignment format for multiple variables.

Note that in this case, none of these optimisation matter much, as 99% of the calculation time is spent in the expensive inv function.

RESPONSE TO THE UPDATE: There's nothing wrong with the inv function, it just is a costly operation to do. But again I think you may be misunderstanding what the memory counting does. It is not that the memory use is increasing, like you would be looking for in C++ if you had a pointer to an object that was never released (a memory leak). The memory use is constant, but the total sum of allocations increase, because the inv function has to make some internal allocations.

Consider this example

for i in 1:n
    b = [1, 2, 3, 4]  # Here a length-4 Array{Int64} is initialized in memory, cost is 32 bytes
end                   # Here, that memory is released.

For each run through the for loop, 32 bytes is allocated, and 32 bytes is released. When the loop ends, regardless of n, 0 bytes will be allocated from this operation. But Julia's memory tracking only adds the allocations - so after running the code you will see allocation of 32*n bytes. The reason julia does this is that allocating space in the RAM is one of the costliest operations in computing - so reducing allocations is a good way to speed up code. But you cannot avoid it.

There is thus nothing wrong with your code (in the new format) - the memory allocation and time taken you see is just a result of doing a big (expensive) operation.

like image 129
Michael K. Borregaard Avatar answered Dec 29 '22 12:12

Michael K. Borregaard