Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I split an ndarray based on array of indexes?

Tags:

python

numpy

I'm fairly new to Python, and very new to Numpy.

So far I have an ndarray of data where is a list of lists, and I have an array of indexes. How can I remove every row who's index is inside of the array of indexes and put that row into a new ndarray?

For example, my data looks like

[[1 1 1 1]
 [2 3 4 5]
 [6 7 8 9]
 [2 2 2 2]]

and my index array is

[0 2]

I would want two get two arrays, one of

[[1 1 1 1]
 [6 7 8 9]]

and

[[2 3 4 5]
 [2 2 2 2]]

Extended example, for clarity: For example, my data looks like

[[1 1 1 1]
 [2 3 4 5]
 [6 7 8 9]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]
 [7 7 7 7]]

and my index array is

[0 2 3 5]

I would want two get two arrays, one of

[[1 1 1 1]
 [6 7 8 9]
 [2 2 2 2]
 [4 4 4 4]]

and

[[2 3 4 5]
 [3 3 3 3]
 [5 5 5 5]
 [6 6 6 6]
 [7 7 7 7]]

I have looked into numpy.take() and numpy.choose() but I could not figure it out. Thanks!

edit: I should also add that my input data and index array are of variable length, depending on the data-sets. I would like a solution that would work for variable sizes.

like image 304
k.schroeder31 Avatar asked Oct 26 '12 19:10

k.schroeder31


People also ask

How do I split Ndarray in NumPy?

Use the hsplit() method to split the 2-D array into three 2-D arrays along rows. Note: Similar alternates to vstack() and dstack() are available as vsplit() and dsplit() .

How do you split Ndarray in Python?

The numpy. array_split() method in Python is used to split an array into multiple sub-arrays of equal size. In Python, an array is a data structure that is used to store multiple items of the same type together.

How do I split a NumPy array vertically?

The vsplit() function is used to split an array into multiple sub-arrays vertically (row-wise). Note: vsplit is equivalent to split with axis=0 (default), the array is always split along the first axis regardless of the array dimension.


1 Answers

Sorry, so you already have take and basically need the opposite of take, you can get that with some indexing nicely:

a = np.arange(16).reshape((8,2))
b = [2, 6, 7]
mask = np.ones(len(a), dtype=bool)
mask[b,] = False
x, y = a[b], a[mask] # instead of a[b] you could also do a[~mask]
print x
array([[ 4,  5],
       [12, 13],
       [14, 15]])
print y
array([[ 0,  1],
       [ 2,  3],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

So you just create a boolean mask that is True wherever b would not select from a.


There is actually already np.split which handles this (its pure python code, but that should not really bother you):

>>> a = np.arange(16).reshape((8,2))
>>> b = [2, 6]
>>> print np.split(a, b, axis=0) # plus some extra formatting
[array([[0, 1],
       [2, 3]]),
 array([[ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]]),
 array([[12, 13],
       [14, 15]])]

split always includes the slice from 0:b[0] and b[0]:, I guess you can just slice them out of the results for simplicity. If you have regular splits of course (all the same size), you may just be better of with using reshape.

Note also that this returns views. So if you change those arrays you change the original unless you call .copy first.

like image 91
seberg Avatar answered Oct 26 '22 12:10

seberg