Optimal (broadcasted) matrix division in numpy. Avoiding temporary arrays or not?

Question

Numpy allows matrices of different sizes to be added/multiplied/divided provided certain broadcasting rules are followed. Also, creation of temporary arrays is a major speed impediment to numpy.

The following timit results surprise me...what is going on?

In [41]: def f_no_dot(mat,arr):
   ....:     return mat/arr

In [42]: def f_dot(mat,arr):
   ....:     denominator = scipy.dot(arr, scipy.ones((1,2)))
   ....:     return mat/denominator

In [43]: mat = scipy.rand(360000,2)

In [44]: arr = scipy.rand(360000,1)

In [45]: timeit temp = f_no_dot(mat,arr)
10 loops, best of 3: 44.7 ms per loop

In [46]: timeit temp = f_dot(mat,arr)
100 loops, best of 3: 10.1 ms per loop

I thought that f_dot would be slower since it had to create the temporary array denominator, and I assumed that this step was skipped by f_no_dot. I should note that these times scale linearly (with array size, up to length 1 billion) for f_no_dot, and slightly worse than linear for f_dot.

Joe Kington · Accepted Answer

I thought that f_dot would be slower since it had to create the temporary array denominator, and I assumed that this step was skipped by f_no_dot.

For what it's worth, creating the temporary array is skipped, which is why f_no_dot is slower (but uses less memory).

Element-wise operations on arrays of the same size are faster, because numpy doesn't have to worry about the striding (dimensions, size, etc) of the arrays.

Operations that use broadcasting will generally be a bit slower than operations that don't have to.

If you have the memory to spare, creating a temporary copy can give you a speedup, but will use more memory.

For example, comparing these three functions:

import numpy as np
import timeit

def f_no_dot(x, y):
    return x / y

def f_dot(x, y):
    denom = np.dot(y, np.ones((1,2)))
    return x / denom

def f_in_place(x, y):
    x /= y
    return x

num = 3600000
x = np.ones((num, 2))
y = np.ones((num, 1))


for func in ['f_dot', 'f_no_dot', 'f_in_place']:
    t = timeit.timeit('%s(x,y)' % func, number=100,
            setup='from __main__ import x,y,f_dot, f_no_dot, f_in_place')
    print func, 'time...'
    print t / 100.0

This yields similar timings to your results:

f_dot time...
0.184361531734
f_no_dot time...
0.619203259945
f_in_place time...
0.585789341927

However, if we compare the memory usage, things become a bit clearer...

The combined size of our x and y arrays is about 27.5 + 55 MB, or 82 MB (for 64-bit ints). There's an additional ~11 MB of overhead in import numpy, etc.

Returning x / y as a new array (i.e. not doing x /= y) will require another 55 MB array.

100 runs of f_dot: We're creating a temporary array here, so we'd expect to see 11 + 82 + 55 + 55 MB or ~203 MB of memory usage. And, that's what we see... enter image description here

100 runs of f_no_dot: If no temporary array is created, we'd expect a peak memory usage of 11 + 82 + 55 MB, or 148 MB...
enter image description here ...which is exactly what we see.

So, x / y is not creating an additional num x 2 temporary array to do the division.

Thus, the division takes a quite a bit longer than it would if it were operating on two arrays of the same size.

100 runs of f_in_place: If we can modify x in-place, we can save even more memory, if that's the main concern. enter image description here

Basically, numpy tries to conserve memory at the expense of speed, in some cases.

Optimal (broadcasted) matrix division in numpy. Avoiding temporary arrays or not?

Tags:

performance

python

numpy

Ian Langmore

1 Answers

Joe Kington

Recent Activity

Donate For Us

Optimal (broadcasted) matrix division in numpy. Avoiding temporary arrays or not?

Tags:

performance

python

numpy

Ian Langmore

1 Answers

Joe Kington

Related questions

Recent Activity

Donate For Us