I have the following numpy arrays:
arr_1 = [[1,2],[3,4],[5,6]] # 3 X 2
arr_2 = [[0.5,0.6],[0.7,0.8],[0.9,1.0],[1.1,1.2],[1.3,1.4]] # 5 X 2
arr_1
is clearly a 3 X 2
array, whereas arr_2
is a 5 X 2
array.
Now without looping, I want to element-wise multiply arr_1 and arr_2 so that I apply a sliding window technique (window size 3) to arr_2.
Example:
Multiplication 1: np.multiply(arr_1,arr_2[:3,:])
Multiplication 2: np.multiply(arr_1,arr_2[1:4,:])
Multiplication 3: np.multiply(arr_1,arr_2[2:5,:])
I want to do this in some sort of a matrix multiplication form to make it faster than my current solution which is of the form:
for i in (2):
np.multiply(arr_1,arr_2[i:i+3,:])
So if the number of rows in arr_2 are large (of the order of tens of thousands), this solution doesn't really scale very well.
Any help would be much appreciated.
The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.
The Sliding Window algorithm is one way programmers can move towards simplicity in their code. This algorithm is exactly as it sounds; a window is formed over some part of data, and this window can slide over the data to capture different portions of it.
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
We can use NumPy broadcasting
to create those sliding windowed indices in a vectorized manner. Then, we can simply index into arr_2
with those to create a 3D
array and perform element-wise multiplication with 2D
array arr_1
, which in turn will bring on broadcasting
again.
So, we would have a vectorized implementation like so -
W = arr_1.shape[0] # Window size
idx = np.arange(arr_2.shape[0]-W+1)[:,None] + np.arange(W)
out = arr_1*arr_2[idx]
Runtime test and verify results -
In [143]: # Input arrays
...: arr_1 = np.random.rand(3,2)
...: arr_2 = np.random.rand(10000,2)
...:
...: def org_app(arr_1,arr_2):
...: W = arr_1.shape[0] # Window size
...: L = arr_2.shape[0]-W+1
...: out = np.empty((L,W,arr_1.shape[1]))
...: for i in range(L):
...: out[i] = np.multiply(arr_1,arr_2[i:i+W,:])
...: return out
...:
...: def vectorized_app(arr_1,arr_2):
...: W = arr_1.shape[0] # Window size
...: idx = np.arange(arr_2.shape[0]-W+1)[:,None] + np.arange(W)
...: return arr_1*arr_2[idx]
...:
In [144]: np.allclose(org_app(arr_1,arr_2),vectorized_app(arr_1,arr_2))
Out[144]: True
In [145]: %timeit org_app(arr_1,arr_2)
10 loops, best of 3: 47.3 ms per loop
In [146]: %timeit vectorized_app(arr_1,arr_2)
1000 loops, best of 3: 1.21 ms per loop
This is a nice case to test the speed of as_strided
and Divakar's broadcasting.
In [281]: %%timeit
...: out=np.empty((L,W,arr1.shape[1]))
...: for i in range(L):
...: out[i]=np.multiply(arr1,arr2[i:i+W,:])
...:
10 loops, best of 3: 48.9 ms per loop
In [282]: %%timeit
...: idx=np.arange(L)[:,None]+np.arange(W)
...: out=arr1*arr2[idx]
...:
100 loops, best of 3: 2.18 ms per loop
In [283]: %%timeit
...: arr3=as_strided(arr2, shape=(L,W,2), strides=(16,16,8))
...: out=arr1*arr3
...:
1000 loops, best of 3: 805 µs per loop
Create Numpy array without enumerating array for more of a comparison of these methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With