I have a big NumPy array that I want to divide into many subarrays by moving a window of a particular size, here's my code in the case of subarrays of size 11:
import numpy as np
x = np.arange(10000)
T = np.array([])
for i in range(len(x)-11):
s = x[i:i+11]
T = np.concatenate((T, s), axis=0)
But it is very slow for arrays having more than 1 million entries, is there any tip to make it faster?
Actually, this is a case for as_strided:
from numpy.lib.stride_tricks import as_strided
# set up
x = np.arange(1000000); windows = 11
# strides of x
stride = x.strides;
T = as_strided(x, shape=(len(x)-windows+1, windows), strides=(stride, stride))
Output:
array([[ 0, 1, 2, ..., 8, 9, 10],
[ 1, 2, 3, ..., 9, 10, 11],
[ 2, 3, 4, ..., 10, 11, 12],
...,
[999987, 999988, 999989, ..., 999995, 999996, 999997],
[999988, 999989, 999990, ..., 999996, 999997, 999998],
[999989, 999990, 999991, ..., 999997, 999998, 999999]])
Performance:
5.88 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
I think your current method does not produce what you are describing. Here is a faster method which splits a long array into many sub arrays using list comprehension:
import numpy as np
x = np.arange(10000)
T = np.array([])
T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
sample_1 = '''
import numpy as np
x = np.arange(10000)
T = np.array([])
for i in range(len(x)-11):
s = x[i:i+11]
T = np.concatenate((T, s),axis=0)
'''
sample_2 = '''
import numpy as np
x = np.arange(10000)
T = np.array([])
T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
'''
# Testing the times
import timeit
print(timeit.timeit(sample_1, number=1))
print(timeit.timeit(sample_2, number=1))
5.839815437000652 # Your method
0.11047088200211874 # List Comprehension
I only checked 1 iteration as the difference is quite significant and many iterations would not change the overall outcome.
# Your method:
[ 0.00000000e+00 1.00000000e+00 2.00000000e+00 ..., 9.99600000e+03
9.99700000e+03 9.99800000e+03]
# Using List Comprehension:
[[ 0 1 2 ..., 8 9 10]
[ 1 2 3 ..., 9 10 11]
[ 2 3 4 ..., 10 11 12]
...,
[9986 9987 9988 ..., 9994 9995 9996]
[9987 9988 9989 ..., 9995 9996 9997]
[9988 9989 9990 ..., 9996 9997 9998]]
You can see that my method actually produces sub-arrays, unlike what your provided code does.
These tests were carried out on x which was just a list of ordered numbers from 0 to 10000.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With