Basically, I'm doing some data analysis. I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN
, or by being a string written "NA
").
I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?
The most common way to do so is by using the . fillna() method. This method requires you to specify a value to replace the NaNs with.
How to drop all missing values from a numpy array? Droping the missing values or nan values can be done by using the function "numpy. isnan()" it will give us the indexes which are having nan values and when combined with other function which is "numpy. logical_not()" where the boolean values will be reversed.
Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.). It is also possible to select multiple rows and columns using a slice or a list.
>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]]) array([[ 1., 2., 3.], [ 4., 5., nan], [ 7., 8., 9.]]) >>> a[~np.isnan(a).any(axis=1)] array([[ 1., 2., 3.], [ 7., 8., 9.]])
and reassign this to a
.
Explanation: np.isnan(a)
returns a similar array with True
where NaN
, False
elsewhere. .any(axis=1)
reduces an m*n
array to n
with an logical or
operation on the whole rows, ~
inverts True/False
and a[ ]
chooses just the rows from the original array, which have True
within the brackets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With