I'm new to Julia. I mainly program in python.
In python, if you want to iterate over a large set of values, it is typical to construct a so-called generator to save memory usage. Here is one example code:
def generator(N):
for i in range(N):
yield i
I wonder if there is anything alike in Julia. After reading julia manual, @task macro seems to have the same (or similar) functionality as generator in python. However, after some experiments, the memory usage seems to be larger than usual array in julia.
I use @time
in IJulia to see the memory usage.
Here is my sample code:
[Update]: Add the code for generator
method
(The generator
method)
function generator(N::Int)
for i in 1:N
produce(i)
end
end
(generator version)
function fun_gener()
sum = 0
g = @task generator(100000)
for i in g
sum += i
end
sum
end
@time fun_gener()
elapsed time: 0.420731828 seconds (6507600 bytes allocated)
(array version)
function fun_arry()
sum = 0
c = [1:100000]
for i in c
sum += i
end
sum
end
@time fun_arry()
elapsed time: 0.000629629 seconds (800144 bytes allocated)
Could anyone tell me why @task
will require more space in this case?
And if I want to save memory usage as dealing with a large set of values,
what can I do?
I recommend the "tricked out iterators" blogpost by Carl Vogel, which discusses julia's iterator protocol, tasks and co-routines in some detail.
See also task-aka-coroutines in the julia docs.
In this case you should use the Range type (which defines an iterator protocol):
julia> function fun_arry()
sum = 0
c = 1:100000 # remove the brackets, makes this a Range
for i in c
sum += i
end
sum
end
fun_arry (generic function with 1 method)
julia> fun_arry() # warm up
5000050000
julia> @time fun_arry()
elapsed time: 8.965e-6 seconds (192 bytes allocated)
5000050000
Faster and less memory allocated (just like xrange
in python 2).
A snippet from the blogpost:
From https://github.com/JuliaLang/julia/blob/master/base/range.jl, here’s how a Range’s iterator protocol is defined:
start(r::Ranges) = 0 next{T}(r::Range{T}, i) = (oftype(T, r.start + i*step(r)), i+1) next{T}(r::Range1{T}, i) = (oftype(T, r.start + i), i+1) done(r::Ranges, i) = (length(r) <= i)
Notice that the next method calculates the value of the iterator in state i. This is different from an Array iterator, which just reads the element a[i] from memory.
Iterators that exploit delayed evaluation like this can have important performance benefits. If we want to iterate over the integers 1 to 10,000, iterating over an Array means we have to allocate about 80MB to hold it. A Range only requires 16 bytes; the same size as the range 1 to 100,000 or 1 to 100,000,000.
You can write a generator method (using Tasks):
julia> function generator(n)
for i in 1:n # Note: we're using a Range here!
produce(i)
end
end
generator (generic function with 2 methods)
julia> for x in Task(() -> generator(3))
println(x)
end
1
2
3
Note: if you replace the Range with this, the performance is much poorer (and allocates way more memory):
julia> @time fun_arry()
elapsed time: 0.699122659 seconds (9 MB allocated)
5000050000
This question was asked (and answered) quite a while ago. Since this question is ranked high on google searches, I'd like to mention that both the question and answer are outdated.
Nowadays, I'd suggest checking out https://github.com/BenLauwens/ResumableFunctions.jl for a Julia library with a macro that implements Python-like yield generators.
using ResumableFunctions
@resumable function fibonnaci(n::Int) :: Int
a = 0
b = 1
for i in 1:n-1
@yield a
a, b = b, a+b
end
a
end
for fib in fibonnaci(10)
println(fib)
end
Since its scope is much more limited than full coroutines, it is also an order of magnitude more efficient than pushing values into a channel since it can compile the generator into a FSM. (Channels have replaced the old produce() function mentioned in the question and previous answers).
With that said, I'd still suggest pushing into a channel as your first approach if performance isn't an issue, because resumablefunctions can sometimes be finicky when compiling your function and can occasionally hit some worst-case behaviour. In particular, because it is a macro that compiles to an FSM rather than a function, you currently need to annotate the types of all variables in the Resumablefunction to get good performance, unlike vanilla Julia functions where this is handled by JIT when the function is first called.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With