I am doing something like following code and I am not happy with performance of np.roll() function. I am summing baseArray and otherArray, where baseArray is rolled by one element in each iteration. But I do not need the copy of the baseArray when I roll it, I would rather prefer a view such that for example when I sum baseArray with other array and if baseArray was rolled twice then the 2nd element of basearray is summed with 0th element of otherArray, 3rd element of baseArray is summed with 1st of otherArray etc.
I.E. to achieve the same result as with np.roll() but without copying the array.
import numpy as np
from numpy import random
import cProfile
def profile():
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
otherArray= np.random.rand(1000000)
baseArray=baseArray+otherArray
cProfile.run('profile()')
output (note 3rd row - the roll function):
9005 function calls in 26.741 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 5.123 5.123 26.740 26.740 <ipython-input-101-9006a6c0d2e3>:5(profile)
1 0.001 0.001 26.741 26.741 <string>:1(<module>)
1000 0.237 0.000 8.966 0.009 numeric.py:1327(roll)
1000 0.004 0.000 0.005 0.000 numeric.py:476(asanyarray)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 12.650 0.013 12.650 0.013 {method 'rand' of 'mtrand.RandomState' objects}
1000 0.005 0.000 0.005 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1000 6.390 0.006 6.390 0.006 {method 'take' of 'numpy.ndarray' objects}
2000 1.345 0.001 1.345 0.001 {numpy.core.multiarray.arange}
1000 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
1000 0.985 0.001 0.985 0.001 {numpy.core.multiarray.concatenate}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.000 0.000 {range}
I'm pretty sure it's impossible to avoid a copy due to the way in which numpy arrays are represented internally. An array consists of a contiguous block of memory addresses plus some metadata that includes the array dimensions, the item size, and the separation between elements for each dimension (the "stride"). "Rolling" each element forwards or backwards would require having different length strides along the same dimension, which is not possible.
That said, you can avoid copying all but one element in baseArray
using slice indexing:
import numpy as np
def profile1(seed=0):
gen = np.random.RandomState(seed)
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
otherArray= gen.rand(1000000)
baseArray=baseArray+otherArray
return baseArray
def profile2(seed=0):
gen = np.random.RandomState(seed)
baseArray = np.zeros(1000000)
for i in range(1000):
otherArray = gen.rand(1000000)
tmp1 = baseArray[:-1] # view of the first n-1 elements
tmp2 = baseArray[-1] # copy of the last element
baseArray[1:]=tmp1+otherArray[1:] # write the last n-1 elements
baseArray[0]=tmp2+otherArray[0] # write the first element
return baseArray
These will give identical results:
In [1]: x1 = profile1()
In [2]: x2 = profile2()
In [3]: np.allclose(x1, x2)
Out[3]: True
In practice there isn't that much difference in performance:
In [4]: %timeit profile1()
1 loop, best of 3: 23.4 s per loop
In [5]: %timeit profile2()
1 loop, best of 3: 17.3 s per loop
My function profile3()
is faster by another factor of four. During accumulation, it uses slice indexing with increasing shift instead of any roll. After the loop, a single roll by 1000 elements yields the same alignment as the other functions.
import numpy as np
from timeit import timeit
def profile1(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000) # outside the loop after Marcel's comment above
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
baseArray=baseArray+otherArray
return baseArray
def profile2(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000)
baseArray = np.zeros(1000000)
for i in range(1000):
tmp1 = baseArray[:-1] # view of the first n-1 elements
tmp2 = baseArray[-1] # copy of the last element
baseArray[1:]=tmp1+otherArray[1:] # write the last n-1 elements
baseArray[0]=tmp2+otherArray[0] # write the first element
return baseArray
def profile3(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000)
baseArray = np.zeros(1000000)
for i in range(1,1001): # use % or itertools.cycle if range > shape
baseArray[:-i] += otherArray[i:]
baseArray[-i:] += otherArray[:i]
return np.roll(baseArray,1000)
print(timeit(profile1,number=1)) # 7.0
print(timeit(profile2,number=1)) # 4.7
print(timeit(profile3,number=1)) # 1.2
x2 = profile2()
x3 = profile3()
print(np.allclose(x2, x3)) # True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With