Numpy allows matrices of different sizes to be added/multiplied/divided provided certain broadcasting rules are followed. Also, creation of temporary arrays is a major speed impediment to numpy.
The following timit results surprise me...what is going on?
In [41]: def f_no_dot(mat,arr):
....: return mat/arr
In [42]: def f_dot(mat,arr):
....: denominator = scipy.dot(arr, scipy.ones((1,2)))
....: return mat/denominator
In [43]: mat = scipy.rand(360000,2)
In [44]: arr = scipy.rand(360000,1)
In [45]: timeit temp = f_no_dot(mat,arr)
10 loops, best of 3: 44.7 ms per loop
In [46]: timeit temp = f_dot(mat,arr)
100 loops, best of 3: 10.1 ms per loop
I thought that f_dot would be slower since it had to create the temporary array denominator, and I assumed that this step was skipped by f_no_dot. I should note that these times scale linearly (with array size, up to length 1 billion) for f_no_dot, and slightly worse than linear for f_dot.
I thought that f_dot would be slower since it had to create the temporary array denominator, and I assumed that this step was skipped by f_no_dot.
For what it's worth, creating the temporary array is skipped, which is why f_no_dot
is slower (but uses less memory).
Element-wise operations on arrays of the same size are faster, because numpy doesn't have to worry about the striding (dimensions, size, etc) of the arrays.
Operations that use broadcasting will generally be a bit slower than operations that don't have to.
If you have the memory to spare, creating a temporary copy can give you a speedup, but will use more memory.
For example, comparing these three functions:
import numpy as np
import timeit
def f_no_dot(x, y):
return x / y
def f_dot(x, y):
denom = np.dot(y, np.ones((1,2)))
return x / denom
def f_in_place(x, y):
x /= y
return x
num = 3600000
x = np.ones((num, 2))
y = np.ones((num, 1))
for func in ['f_dot', 'f_no_dot', 'f_in_place']:
t = timeit.timeit('%s(x,y)' % func, number=100,
setup='from __main__ import x,y,f_dot, f_no_dot, f_in_place')
print func, 'time...'
print t / 100.0
This yields similar timings to your results:
f_dot time...
0.184361531734
f_no_dot time...
0.619203259945
f_in_place time...
0.585789341927
However, if we compare the memory usage, things become a bit clearer...
The combined size of our x
and y
arrays is about 27.5 + 55 MB, or 82 MB (for 64-bit ints). There's an additional ~11 MB of overhead in import numpy, etc.
Returning x / y
as a new array (i.e. not doing x /= y
) will require another 55 MB array.
100 runs of f_dot
:
We're creating a temporary array here, so we'd expect to see 11 + 82 + 55 + 55 MB or ~203 MB of memory usage. And, that's what we see...
100 runs of f_no_dot
:
If no temporary array is created, we'd expect a peak memory usage of 11 + 82 + 55 MB, or 148 MB...
...which is exactly what we see.
So, x / y
is not creating an additional num x 2
temporary array to do the division.
Thus, the division takes a quite a bit longer than it would if it were operating on two arrays of the same size.
100 runs of f_in_place
:
If we can modify x
in-place, we can save even more memory, if that's the main concern.
Basically, numpy tries to conserve memory at the expense of speed, in some cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With