Efficient use of python generators in a tight double for loop over numpy arrays

Question

The speed bottleneck in my code is a tight double for loop over elements of two arrays, x, and y. A standard hpc trick to improve performance is to do the loop in chunks so that cache misses can be minimized. I am trying to use python generators to do the chunking, but the need to continually recreate the spent generator within the outer for loop is killing my runtime.

Question:

Is there a more intelligent algorithm for constructing the appropriate generator for performing chunked double-for loops?

Concrete illustration:

I'll create two dummy arrays, x, and y. I'll keep them short for illustration, but in practice these are numpy arrays with ~1e6 elements.

x = np.array(['a', 'b', 'b', 'c', 'c', 'd'])
y = np.array(['e', 'f', 'f', 'g'])

The naive double for loop would just be:

for xletter in x:
    for yletter in y:
        # algebraic manipulations on x & y

Now let's use generators to do this loop in chunks:

chunk_size = 3
xchunk_gen = (x[i: i+chunk_size] for i in range(0, len(x), chunk_size))
for xchunk in xchunk_gen:
    ychunk_gen = (y[i: i+chunk_size] for i in range(0, len(y), chunk_size))
    for ychunk in ychunk_gen:
        for xletter in xchunk:
            for yletter in ychunk:
                # algebraic manipulations on x & y

Note that in order to implement a generator solution to this problem, I have to continually re-create ychunk_gen within the outer loop. Since y is a large array, this is killing my runtime (for ~1e6 elements, creating this generator takes ~20ms on my laptop).

Is there a way to be more clever about how I am constructing my generators that gets around this problem? Or will it be necessary to just abandon the generator solution altogether?

(Note: In practice, I am using cython to perform this tight loop, but all of the above applies regardless).

Xavier Combelle · Accepted Answer

In my opinion, the creation of your generator expression is killing your running time because it is not optimized by cython.

A better solution, which keep care of all cache optimization things is to use numexpr. As the manipulation of x and y are algebric it should fits your constraints very well (numexpr can do a little more)

koffein · Answer

You are defining the ychunk_gen all over again within the xchunk-loop. Maybe the following will be faster:

chunk_size = 3
xchunk_gen = (x[i: i+chunk_size] for i in xrange(0, len(x), chunk_size))

def ychunk_gen(some_dependency_on_outer_loop):
    # use some_dependency_on_outer_loop
    for i in xrange(0, len(y), chunk_size):
        yield y[i: i+chunk_size]

for xchunk in xchunk_gen:
    for ychunk in ychunk_gen(chunk_or_something_else):
        for xletter in xchunk:
            for yletter in ychunk:
                # algebraic manipulations on x & y

But perhaps there is an even better way:

I assume x and y are numpy arrays, so you can reshape the arrays and then loop through every line:

for xchunk in x.reshape((len(x)//chunk_size, chunk_size)):
    for ychunk in y.reshape((len(y)//chunk_size, chunk_size)):
        # the letter loops

In http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html I read that if you wanted the data not to be copied by reshape, you should change the shape-property of the data:

x.shape = len(x)//chunk_size, chunk_size 
y.shape = len(y)//chunk_size, chunk_size

Efficient use of python generators in a tight double for loop over numpy arrays

Tags:

performance

python

generator

numpy

cython

aph

2 Answers

Xavier Combelle

koffein

Recent Activity

Donate For Us

Efficient use of python generators in a tight double for loop over numpy arrays

Tags:

performance

python

generator

numpy

cython

aph

2 Answers

Xavier Combelle

koffein

Related questions

Recent Activity

Donate For Us