Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to find a way to construct Julia `generator`

I'm new to Julia. I mainly program in python.

In python, if you want to iterate over a large set of values, it is typical to construct a so-called generator to save memory usage. Here is one example code:

def generator(N):
    for i in range(N):
        yield i

I wonder if there is anything alike in Julia. After reading julia manual, @task macro seems to have the same (or similar) functionality as generator in python. However, after some experiments, the memory usage seems to be larger than usual array in julia.

I use @time in IJulia to see the memory usage. Here is my sample code:

[Update]: Add the code for generator method
(The generator method)

function generator(N::Int)
    for i in 1:N
        produce(i)
    end
end

(generator version)

function fun_gener()
    sum = 0
    g = @task generator(100000)
    for i in g
        sum += i
    end
    sum
 end

@time fun_gener()
elapsed time: 0.420731828 seconds (6507600 bytes allocated)

(array version)

function fun_arry()
    sum = 0
    c = [1:100000]
    for i in c
        sum += i
    end
    sum
end

@time fun_arry()
elapsed time: 0.000629629 seconds (800144 bytes allocated)

Could anyone tell me why @task will require more space in this case? And if I want to save memory usage as dealing with a large set of values, what can I do?

like image 411
DboyLiao Avatar asked Mar 30 '15 05:03

DboyLiao


Video Answer


2 Answers

I recommend the "tricked out iterators" blogpost by Carl Vogel, which discusses julia's iterator protocol, tasks and co-routines in some detail.

See also task-aka-coroutines in the julia docs.


In this case you should use the Range type (which defines an iterator protocol):

julia> function fun_arry()
           sum = 0
           c = 1:100000  # remove the brackets, makes this a Range
           for i in c
               sum += i
           end
           sum
       end
fun_arry (generic function with 1 method)

julia> fun_arry()  # warm up
5000050000

julia> @time fun_arry()
elapsed time: 8.965e-6 seconds (192 bytes allocated)
5000050000

Faster and less memory allocated (just like xrange in python 2).

A snippet from the blogpost:

From https://github.com/JuliaLang/julia/blob/master/base/range.jl, here’s how a Range’s iterator protocol is defined:

start(r::Ranges) = 0
next{T}(r::Range{T}, i) = (oftype(T, r.start + i*step(r)), i+1)
next{T}(r::Range1{T}, i) = (oftype(T, r.start + i), i+1)
done(r::Ranges, i) = (length(r) <= i)

Notice that the next method calculates the value of the iterator in state i. This is different from an Array iterator, which just reads the element a[i] from memory.

Iterators that exploit delayed evaluation like this can have important performance benefits. If we want to iterate over the integers 1 to 10,000, iterating over an Array means we have to allocate about 80MB to hold it. A Range only requires 16 bytes; the same size as the range 1 to 100,000 or 1 to 100,000,000.


You can write a generator method (using Tasks):

julia> function generator(n)
          for i in 1:n      # Note: we're using a Range here!
              produce(i)
          end
       end
generator (generic function with 2 methods)

julia> for x in Task(() -> generator(3))
          println(x)
       end
1
2
3

Note: if you replace the Range with this, the performance is much poorer (and allocates way more memory):

julia> @time fun_arry()
elapsed time: 0.699122659 seconds (9 MB allocated)
5000050000
like image 117
Andy Hayden Avatar answered Sep 21 '22 03:09

Andy Hayden


This question was asked (and answered) quite a while ago. Since this question is ranked high on google searches, I'd like to mention that both the question and answer are outdated.

Nowadays, I'd suggest checking out https://github.com/BenLauwens/ResumableFunctions.jl for a Julia library with a macro that implements Python-like yield generators.

using ResumableFunctions

@resumable function fibonnaci(n::Int) :: Int
  a = 0
  b = 1
  for i in 1:n-1
    @yield a
    a, b = b, a+b
  end
  a
end

for fib in fibonnaci(10)
  println(fib)
end

Since its scope is much more limited than full coroutines, it is also an order of magnitude more efficient than pushing values into a channel since it can compile the generator into a FSM. (Channels have replaced the old produce() function mentioned in the question and previous answers).

With that said, I'd still suggest pushing into a channel as your first approach if performance isn't an issue, because resumablefunctions can sometimes be finicky when compiling your function and can occasionally hit some worst-case behaviour. In particular, because it is a macro that compiles to an FSM rather than a function, you currently need to annotate the types of all variables in the Resumablefunction to get good performance, unlike vanilla Julia functions where this is handled by JIT when the function is first called.

like image 24
saolof Avatar answered Sep 21 '22 03:09

saolof