I have an array that might look like this:
ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875,
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]
Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells.
But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time.
I get the number of non-zero elements in each row with the following line of code:
NumNonzeroElementsInRows = (ANOVAInputMatrixValuesArray != 0).sum(1)
For the array above, NumNonzeroElementsInRows
contains: [5 4]
The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero.
Therefore, I am trying to use the following lines of code to find and delete rows that contain zero values.
for q in range(len(NumNonzeroElementsInRows)):
if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
p.delete(ANOVAInputMatrixValuesArray, q, axis=0)
But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code.
There must be some easy way to simply "delete any row that contains a zero value."
Can anyone show me what code to write to accomplish this?
Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.). It is also possible to select multiple rows and columns using a slice or a list.
delete() – The numpy. delete() is a function in Python which returns a new array with the deletion of sub-arrays along with the mentioned axis. By keeping the value of the axis as zero, there are two possible ways to delete multiple rows using numphy. delete().
np. delete(ndarray, index, axis): Delete items of rows or columns from the NumPy array based on given index conditions and axis specified, the parameter ndarray is the array on which the manipulation will happen, the index is the particular rows based on conditions to be deleted, axis=0 for removing rows in our case.
To remove an element from a NumPy array: Specify the index of the element to remove. Call the numpy. delete() function on the array for the given index.
The simplest way to delete rows and columns from arrays is the numpy.delete
method.
Suppose I have the following array x
:
x = array([[1,2,3],
[4,5,6],
[7,8,9]])
To delete the first row, do this:
x = numpy.delete(x, (0), axis=0)
To delete the third column, do this:
x = numpy.delete(x,(2), axis=1)
So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.
Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):
>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],
[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875 , 0.53172222]])
By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.
By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.
This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.
>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5, 0. ],
[ 1.4, 1.5],
[ 1.6, 0. ],
[ 1.7, 1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4, 1.5],
[ 1.7, 1.8]])
By the way, your line p.delete()
doesn't work for me - ndarray
s don't have a .delete
attribute.
numpy provides a simple function to do the exact same thing: supposing you have a masked array 'a', calling numpy.ma.compress_rows(a) will delete the rows containing a masked value. I guess this is much faster this way...
import numpy as np
arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
print(arr[np.where(arr != 0.)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With