I have an array that might look like this: <pre class="prettyprint"><code>ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]] </code></pre> Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells. But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time. I get the number of non-zero elements in each row with the following line of code: <pre class="prettyprint"><code>NumNonzeroElementsInRows = (ANOVAInputMatrixValuesArray != 0).sum(1) </code></pre> For the array above, <code>NumNonzeroElementsInRows</code> contains: [5 4] The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero. Therefore, I am trying to use the following lines of code to find and delete rows that contain zero values. <pre class="prettyprint"><code>for q in range(len(NumNonzeroElementsInRows)): if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max(): p.delete(ANOVAInputMatrixValuesArray, q, axis=0) </code></pre> But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code. There must be some easy way to simply "delete any row that contains a zero value." Can anyone show me what code to write to accomplish this?

The simplest way to delete rows and columns from arrays is the <code>numpy.delete</code> method. Suppose I have the following array <code>x</code>: <pre class="prettyprint"><code>x = array([[1,2,3], [4,5,6], [7,8,9]]) </code></pre> To delete the first row, do this: <pre class="prettyprint"><code>x = numpy.delete(x, (0), axis=0) </code></pre> To delete the third column, do this: <pre class="prettyprint"><code>x = numpy.delete(x,(2), axis=1) </code></pre> So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.

This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower. <pre class="prettyprint"><code>>>> import numpy as np >>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]]) >>> p array([[ 1.5, 0. ], [ 1.4, 1.5], [ 1.6, 0. ], [ 1.7, 1.8]]) >>> nz = (p == 0).sum(1) >>> q = p[nz == 0, :] >>> q array([[ 1.4, 1.5], [ 1.7, 1.8]]) </code></pre> By the way, your line <code>p.delete()</code> doesn't work for me - <code>ndarray</code>s don't have a <code>.delete</code> attribute.

deleting rows in numpy array

Tags:

python

numpy

delete-row

I have an array that might look like this:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells.

But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time.

I get the number of non-zero elements in each row with the following line of code:

NumNonzeroElementsInRows    = (ANOVAInputMatrixValuesArray != 0).sum(1)

For the array above, NumNonzeroElementsInRows contains: [5 4]

The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero.

Therefore, I am trying to use the following lines of code to find and delete rows that contain zero values.

for q in range(len(NumNonzeroElementsInRows)):
    if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
        p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code.

There must be some easy way to simply "delete any row that contains a zero value."

Can anyone show me what code to write to accomplish this?

343

asked Oct 06 '10 22:10

MedicalMath

5 Answers

The simplest way to delete rows and columns from arrays is the numpy.delete method.

Suppose I have the following array x:

x = array([[1,2,3],
        [4,5,6],
        [7,8,9]])

To delete the first row, do this:

x = numpy.delete(x, (0), axis=0)

To delete the third column, do this:

x = numpy.delete(x,(2), axis=1)

So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.

188

answered Sep 23 '22 23:09

Jaidev Deshpande

Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):

>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], 
                [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889,  0.73641667,  0.67521429,  0.592875  ,  0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.

By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.

answered Sep 22 '22 23:09

Justin Peel

This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.

>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5,  0. ],
       [ 1.4,  1.5],
       [ 1.6,  0. ],
       [ 1.7,  1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4,  1.5],
       [ 1.7,  1.8]])

By the way, your line p.delete() doesn't work for me - ndarrays don't have a .delete attribute.

answered Sep 24 '22 23:09

mtrw

numpy provides a simple function to do the exact same thing: supposing you have a masked array 'a', calling numpy.ma.compress_rows(a) will delete the rows containing a masked value. I guess this is much faster this way...

answered Sep 21 '22 23:09

jeps

import numpy as np 
arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
print(arr[np.where(arr != 0.)])