Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deleting rows in numpy array

I have an array that might look like this:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells.

But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time.

I get the number of non-zero elements in each row with the following line of code:

NumNonzeroElementsInRows    = (ANOVAInputMatrixValuesArray != 0).sum(1)

For the array above, NumNonzeroElementsInRows contains: [5 4]

The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero.

Therefore, I am trying to use the following lines of code to find and delete rows that contain zero values.

for q in range(len(NumNonzeroElementsInRows)):
    if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
        p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code.

There must be some easy way to simply "delete any row that contains a zero value."

Can anyone show me what code to write to accomplish this?

like image 343
MedicalMath Avatar asked Oct 06 '10 22:10

MedicalMath


People also ask

How do I delete a row in NumPy?

Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.). It is also possible to select multiple rows and columns using a slice or a list.

How do I delete multiple rows in Python NumPy?

delete() – The numpy. delete() is a function in Python which returns a new array with the deletion of sub-arrays along with the mentioned axis. By keeping the value of the axis as zero, there are two possible ways to delete multiple rows using numphy. delete().

How do I delete a row from NumPy array based on condition?

np. delete(ndarray, index, axis): Delete items of rows or columns from the NumPy array based on given index conditions and axis specified, the parameter ndarray is the array on which the manipulation will happen, the index is the particular rows based on conditions to be deleted, axis=0 for removing rows in our case.

How do I remove part of a NumPy array?

To remove an element from a NumPy array: Specify the index of the element to remove. Call the numpy. delete() function on the array for the given index.


5 Answers

The simplest way to delete rows and columns from arrays is the numpy.delete method.

Suppose I have the following array x:

x = array([[1,2,3],
        [4,5,6],
        [7,8,9]])

To delete the first row, do this:

x = numpy.delete(x, (0), axis=0)

To delete the third column, do this:

x = numpy.delete(x,(2), axis=1)

So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.

like image 188
Jaidev Deshpande Avatar answered Sep 23 '22 23:09

Jaidev Deshpande


Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):

>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], 
                [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889,  0.73641667,  0.67521429,  0.592875  ,  0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.

By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.

like image 40
Justin Peel Avatar answered Sep 22 '22 23:09

Justin Peel


This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.

>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5,  0. ],
       [ 1.4,  1.5],
       [ 1.6,  0. ],
       [ 1.7,  1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4,  1.5],
       [ 1.7,  1.8]])

By the way, your line p.delete() doesn't work for me - ndarrays don't have a .delete attribute.

like image 30
mtrw Avatar answered Sep 24 '22 23:09

mtrw


numpy provides a simple function to do the exact same thing: supposing you have a masked array 'a', calling numpy.ma.compress_rows(a) will delete the rows containing a masked value. I guess this is much faster this way...

like image 21
jeps Avatar answered Sep 21 '22 23:09

jeps


import numpy as np 
arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
print(arr[np.where(arr != 0.)])
like image 45
Prokhozhii Avatar answered Sep 22 '22 23:09

Prokhozhii