Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to numpy roll without copying array

I am doing something like following code and I am not happy with performance of np.roll() function. I am summing baseArray and otherArray, where baseArray is rolled by one element in each iteration. But I do not need the copy of the baseArray when I roll it, I would rather prefer a view such that for example when I sum baseArray with other array and if baseArray was rolled twice then the 2nd element of basearray is summed with 0th element of otherArray, 3rd element of baseArray is summed with 1st of otherArray etc.

I.E. to achieve the same result as with np.roll() but without copying the array.

import numpy as np
from numpy import random
import cProfile

def profile():
    baseArray = np.zeros(1000000)
    for i in range(1000):
        baseArray= np.roll(baseArray,1)
        otherArray= np.random.rand(1000000)
        baseArray=baseArray+otherArray

cProfile.run('profile()')

output (note 3rd row - the roll function):

         9005 function calls in 26.741 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.123    5.123   26.740   26.740 <ipython-input-101-9006a6c0d2e3>:5(profile)
        1    0.001    0.001   26.741   26.741 <string>:1(<module>)
     1000    0.237    0.000    8.966    0.009 numeric.py:1327(roll)
     1000    0.004    0.000    0.005    0.000 numeric.py:476(asanyarray)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1000   12.650    0.013   12.650    0.013 {method 'rand' of 'mtrand.RandomState' objects}
     1000    0.005    0.000    0.005    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1000    6.390    0.006    6.390    0.006 {method 'take' of 'numpy.ndarray' objects}
     2000    1.345    0.001    1.345    0.001 {numpy.core.multiarray.arange}
     1000    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
     1000    0.985    0.001    0.985    0.001 {numpy.core.multiarray.concatenate}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
        1    0.000    0.000    0.000    0.000 {range}
like image 591
Marcel Avatar asked Mar 10 '16 12:03

Marcel


2 Answers

I'm pretty sure it's impossible to avoid a copy due to the way in which numpy arrays are represented internally. An array consists of a contiguous block of memory addresses plus some metadata that includes the array dimensions, the item size, and the separation between elements for each dimension (the "stride"). "Rolling" each element forwards or backwards would require having different length strides along the same dimension, which is not possible.


That said, you can avoid copying all but one element in baseArray using slice indexing:

import numpy as np

def profile1(seed=0):
    gen = np.random.RandomState(seed)
    baseArray = np.zeros(1000000)
    for i in range(1000):
        baseArray= np.roll(baseArray,1)
        otherArray= gen.rand(1000000)
        baseArray=baseArray+otherArray
    return baseArray

def profile2(seed=0):
    gen = np.random.RandomState(seed)
    baseArray = np.zeros(1000000)
    for i in range(1000):
        otherArray = gen.rand(1000000)
        tmp1 = baseArray[:-1]               # view of the first n-1 elements
        tmp2 = baseArray[-1]                # copy of the last element
        baseArray[1:]=tmp1+otherArray[1:]   # write the last n-1 elements
        baseArray[0]=tmp2+otherArray[0]     # write the first element
    return baseArray

These will give identical results:

In [1]: x1 = profile1()

In [2]: x2 = profile2()

In [3]: np.allclose(x1, x2)
Out[3]: True

In practice there isn't that much difference in performance:

In [4]: %timeit profile1()
1 loop, best of 3: 23.4 s per loop

In [5]: %timeit profile2()
1 loop, best of 3: 17.3 s per loop
like image 68
ali_m Avatar answered Oct 12 '22 23:10

ali_m


My function profile3() is faster by another factor of four. During accumulation, it uses slice indexing with increasing shift instead of any roll. After the loop, a single roll by 1000 elements yields the same alignment as the other functions.

import numpy as np
from timeit import timeit

def profile1(seed=0):
    gen = np.random.RandomState(seed)
    otherArray= gen.rand(1000000)           # outside the loop after Marcel's comment above
    baseArray = np.zeros(1000000)
    for i in range(1000):
        baseArray= np.roll(baseArray,1)
        baseArray=baseArray+otherArray
    return baseArray

def profile2(seed=0):
    gen = np.random.RandomState(seed)
    otherArray= gen.rand(1000000)
    baseArray = np.zeros(1000000)
    for i in range(1000):
        tmp1 = baseArray[:-1]               # view of the first n-1 elements
        tmp2 = baseArray[-1]                # copy of the last element
        baseArray[1:]=tmp1+otherArray[1:]   # write the last n-1 elements
        baseArray[0]=tmp2+otherArray[0]     # write the first element
    return baseArray

def profile3(seed=0):
    gen = np.random.RandomState(seed)
    otherArray= gen.rand(1000000)
    baseArray = np.zeros(1000000)
    for i in range(1,1001): # use % or itertools.cycle if range > shape
        baseArray[:-i] += otherArray[i:]
        baseArray[-i:] += otherArray[:i]
    return np.roll(baseArray,1000)

print(timeit(profile1,number=1))  # 7.0
print(timeit(profile2,number=1))  # 4.7
print(timeit(profile3,number=1))  # 1.2

x2 = profile2()
x3 = profile3()
print(np.allclose(x2, x3))  # True
like image 26
Rainald62 Avatar answered Oct 13 '22 00:10

Rainald62