I have a data matrix a
and I have list of indices stored in array idx
. I would like to get 10-length data starting at each of the indices defined by idx
. Right now I use a for
loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.
import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])
# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []
for id in idx:
data.append(a[id:id+10])
data = np.array(data)
Is there anyway to speed up this process? Thanks.
EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a
and an element can occur in more than one chunk. My question does not necessarily "split" the array.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
len() is the Python built-in function that returns the number of elements in a list or the number of characters in a string. For numpy. ndarray , len() returns the size of the first dimension. Equivalent to shape[0] and also equal to size only for one-dimensional arrays.
To split a list into n parts in Python, use the numpy. array_split() function. The np. split() function splits the array into multiple sub-arrays.
A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence.
(Thanks to suggestion from @MadPhysicist)
This should work:
a[idx.reshape(-1, 1) + np.arange(10)]
Output:
Shape (L,10)
, where L
is the length of idx
Notes:
This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx
doesn't contain such values.
Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap')
is an alternative, that will handle out-of-bounds indices by wrapping them around a
. Passing mode='clip'
instead of mode='wrap'
would clip the excessive indices to the last index of a
. But then, np.take()
would probably have a completely different perf. characteristic / scaling characteristic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With