I'm fairly new to Python, and very new to Numpy. So far I have an ndarray of data where is a list of lists, and I have an array of indexes. How can I remove every row who's index is inside of the array of indexes and put that row into a new ndarray? For example, my data looks like <pre class="prettyprint"><code>[[1 1 1 1] [2 3 4 5] [6 7 8 9] [2 2 2 2]] </code></pre> and my index array is <pre class="prettyprint"><code>[0 2] </code></pre> I would want two get two arrays, one of <pre class="prettyprint"><code>[[1 1 1 1] [6 7 8 9]] </code></pre> and <pre class="prettyprint"><code>[[2 3 4 5] [2 2 2 2]] </code></pre> Extended example, for clarity: For example, my data looks like <pre class="prettyprint"><code>[[1 1 1 1] [2 3 4 5] [6 7 8 9] [2 2 2 2] [3 3 3 3] [4 4 4 4] [5 5 5 5] [6 6 6 6] [7 7 7 7]] </code></pre> and my index array is <pre class="prettyprint"><code>[0 2 3 5] </code></pre> I would want two get two arrays, one of <pre class="prettyprint"><code>[[1 1 1 1] [6 7 8 9] [2 2 2 2] [4 4 4 4]] </code></pre> and <pre class="prettyprint"><code>[[2 3 4 5] [3 3 3 3] [5 5 5 5] [6 6 6 6] [7 7 7 7]] </code></pre> I have looked into numpy.take() and numpy.choose() but I could not figure it out. Thanks! edit: I should also add that my input data and index array are of variable length, depending on the data-sets. I would like a solution that would work for variable sizes.

Sorry, so you already have <code>take</code> and basically need the opposite of <code>take</code>, you can get that with some indexing nicely: <pre class="prettyprint"><code>a = np.arange(16).reshape((8,2)) b = [2, 6, 7] mask = np.ones(len(a), dtype=bool) mask[b,] = False x, y = a[b], a[mask] # instead of a[b] you could also do a[~mask] print x array([[ 4, 5], [12, 13], [14, 15]]) print y array([[ 0, 1], [ 2, 3], [ 6, 7], [ 8, 9], [10, 11]]) </code></pre> So you just create a boolean mask that is True wherever <code>b</code> would not select from <code>a</code>. <hr> There is actually already <code>np.split</code> which handles this (its pure python code, but that should not really bother you): <pre class="prettyprint"><code>>>> a = np.arange(16).reshape((8,2)) >>> b = [2, 6] >>> print np.split(a, b, axis=0) # plus some extra formatting [array([[0, 1], [2, 3]]), array([[ 4, 5], [ 6, 7], [ 8, 9], [10, 11]]), array([[12, 13], [14, 15]])] </code></pre> split always includes the slice from <code>0:b[0]</code> and <code>b[0]:</code>, I guess you can just slice them out of the results for simplicity. If you have regular splits of course (all the same size), you may just be better of with using <code>reshape</code>. Note also that this returns views. So if you change those arrays you change the original unless you call <code>.copy</code> first.

How do I split an ndarray based on array of indexes?

Tags:

python

numpy

I'm fairly new to Python, and very new to Numpy.

So far I have an ndarray of data where is a list of lists, and I have an array of indexes. How can I remove every row who's index is inside of the array of indexes and put that row into a new ndarray?

For example, my data looks like

[[1 1 1 1]
 [2 3 4 5]
 [6 7 8 9]
 [2 2 2 2]]

and my index array is

[0 2]

I would want two get two arrays, one of

[[1 1 1 1]
 [6 7 8 9]]

and

[[2 3 4 5]
 [2 2 2 2]]

Extended example, for clarity: For example, my data looks like

[[1 1 1 1]
 [2 3 4 5]
 [6 7 8 9]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]
 [7 7 7 7]]

and my index array is

[0 2 3 5]

I would want two get two arrays, one of

[[1 1 1 1]
 [6 7 8 9]
 [2 2 2 2]
 [4 4 4 4]]

and

[[2 3 4 5]
 [3 3 3 3]
 [5 5 5 5]
 [6 6 6 6]
 [7 7 7 7]]

I have looked into numpy.take() and numpy.choose() but I could not figure it out. Thanks!

edit: I should also add that my input data and index array are of variable length, depending on the data-sets. I would like a solution that would work for variable sizes.

304

asked Oct 26 '12 19:10

k.schroeder31

1 Answers

Sorry, so you already have take and basically need the opposite of take, you can get that with some indexing nicely:

a = np.arange(16).reshape((8,2))
b = [2, 6, 7]
mask = np.ones(len(a), dtype=bool)
mask[b,] = False
x, y = a[b], a[mask] # instead of a[b] you could also do a[~mask]
print x
array([[ 4,  5],
       [12, 13],
       [14, 15]])
print y
array([[ 0,  1],
       [ 2,  3],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

So you just create a boolean mask that is True wherever b would not select from a.

There is actually already np.split which handles this (its pure python code, but that should not really bother you):

>>> a = np.arange(16).reshape((8,2))
>>> b = [2, 6]
>>> print np.split(a, b, axis=0) # plus some extra formatting
[array([[0, 1],
       [2, 3]]),
 array([[ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]]),
 array([[12, 13],
       [14, 15]])]

split always includes the slice from 0:b[0] and b[0]:, I guess you can just slice them out of the results for simplicity. If you have regular splits of course (all the same size), you may just be better of with using reshape.

Note also that this returns views. So if you change those arrays you change the original unless you call .copy first.

answered Oct 26 '22 12:10

seberg

Related questions
                            
                                Can't connect to MongoDB 2.0.5 database with pymongo 2.2
                            
                                Python: shortcut for writing decorators which accept arguments?
                            
                                Ubuntu AMI not loading user-data
                            
                                vim settings for python
                            
                                How to load a pyYAML file and access it using attributes instead of using the dictionary notation?
                            
                                Processing SQS queues with boto
                            
                                why is the construct x = (Condition and A or B) used?
                            
                                Automatic insertion of a colon after 'def', 'if' etc
                            
                                Where is the source code for PyPI, the Python package index? [closed]
                            
                                Python Gtk.Entry placeholder text
                            
                                Flask: How to get url for dynamically generated image file?
                            
                                How to deal with time values over 24 hours in python?
                            
                                How to set up a resource shared by several unit tests?
                            
                                Is it possible to generate correct PKCS12 (.pfx) file in Python?
                            
                                Frequency of global variables in python?
                            
                                Why is lambda asking for 2 arguments despite being given 2 arguments?
                            
                                WSGIPythonPath is not working
                            
                                Sphinx LaTeX markup limitations
                            
                                How to properly quit a program in python
                            
                                Efficient ways to duplicate array/list in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With