Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing different sized ranges in a 2D numpy array using a Pythonic vectorized code

I have a numpy 2D array, and I would like to select different sized ranges of this array, depending on the column index. Here is the input array a = np.reshape(np.array(range(15)), (5, 3)) example

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]

Then, list b = [4,3,1] determines the different range sizes for each column slice, so that we would get the arrays

[0 3 6 9]
[1 4 7]
[2]

which we can concatenate and flatten to get the final desired output

[0 3 6 9 1 4 7 2]

Currently, to perform this task, I am using the following code

slices = []
for i in range(a.shape[1]):
    slices.append(a[:b[i],i])

c = np.concatenate(slices)

and, if possible, I want to convert it to a pythonic format.

Bonus: The same question but now considering that b determines row slices instead of columns.

like image 477
xicocaio Avatar asked Aug 12 '20 15:08

xicocaio


People also ask

How is a 2D array indexed?

Two-dimensional (2D) arrays are indexed by two subscripts, one for the row and one for the column. Each element in the 2D array must by the same type, either a primitive type or object type.

Can a 2D array have different data types Python?

You can have multiple datatypes; String, double, int, and other object types within a single element of the arrray, ie objArray[0] can contain as many different data types as you need. Using a 2-D array has absolutely no affect on the output, but how the data is allocated.

What is a vectorized operation in NumPy?

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

Can you index a NumPy array?

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.


1 Answers

We can use broadcasting to generate an appropriate mask and then masking does the job -

In [150]: a
Out[150]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [151]: b
Out[151]: [4, 3, 1]

In [152]: mask = np.arange(len(a))[:,None] < b

In [153]: a.T[mask.T]
Out[153]: array([0, 3, 6, 9, 1, 4, 7, 2])

Another way to mask would be -

In [156]: a.T[np.greater.outer(b, np.arange(len(a)))]
Out[156]: array([0, 3, 6, 9, 1, 4, 7, 2])

Bonus : Slice per row

If we are required to slice per row based on chunk sizes, we would need to modify few things -

In [51]: a
Out[51]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# slice lengths per row
In [52]: b
Out[52]: [4, 3, 1]

# Usual loop based solution :
In [53]: np.concatenate([a[i,:b_i] for i,b_i in enumerate(b)])
Out[53]: array([ 0,  1,  2,  3,  5,  6,  7, 10])

# Vectorized mask based solution :
In [54]: a[np.greater.outer(b, np.arange(a.shape[1]))]
Out[54]: array([ 0,  1,  2,  3,  5,  6,  7, 10])
like image 176
Divakar Avatar answered Nov 15 '22 05:11

Divakar