Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sampling a fixed length sequence from a numpy array

I have a data matrix a and I have list of indices stored in array idx. I would like to get 10-length data starting at each of the indices defined by idx . Right now I use a for loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.

import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])

# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []

for id in idx:
    data.append(a[id:id+10])  
data = np.array(data)

Is there anyway to speed up this process? Thanks.

EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a and an element can occur in more than one chunk. My question does not necessarily "split" the array.

like image 924
learner Avatar asked Dec 12 '20 08:12

learner


People also ask

Does NumPy array have fixed size?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.

How do I get the length of a NumPy array in Python?

len() is the Python built-in function that returns the number of elements in a list or the number of characters in a string. For numpy. ndarray , len() returns the size of the first dimension. Equivalent to shape[0] and also equal to size only for one-dimensional arrays.

How do you separate values from an array in Python?

To split a list into n parts in Python, use the numpy. array_split() function. The np. split() function splits the array into multiple sub-arrays.

Is a NumPy array a sequence?

A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence.


1 Answers

(Thanks to suggestion from @MadPhysicist)

This should work:

a[idx.reshape(-1, 1) + np.arange(10)]

Output: Shape (L,10), where L is the length of idx

Notes:

  1. This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx doesn't contain such values.

  2. Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap') is an alternative, that will handle out-of-bounds indices by wrapping them around a. Passing mode='clip' instead of mode='wrap' would clip the excessive indices to the last index of a. But then, np.take() would probably have a completely different perf. characteristic / scaling characteristic.

like image 55
fountainhead Avatar answered Nov 14 '22 22:11

fountainhead