I would like to know if there is a clean way to handle nan in numpy.
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[ 5. 4. 2. 2. 4. nan nan 6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])
I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.
Thanks
So for finding unique elements from the array we are using numpy. unique() function of NumPy library. Return: Return the unique of an array.
The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.
You can use np.unique
to find unique values in combination with isnan
to filter the NaN
values:
In [22]:
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2., 4., 5., 6.])
as to why you get multiple NaN
values it's because NaN
values cannot be compared normally:
In [23]:
np.nan == np.nan
Out[23]:
False
so you have to use isnan
to perform the correct comparison
using set
:
In [24]:
set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}
You can call len
on any of the above to get a size:
In [26]:
len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4
I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.
import numpy as np
import pandas as pd
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1)
# array([ 2., 4., 5., 6., nan, nan])
pd.unique(my_array1)
# array([ 5., 4., 2., nan, 6.])
I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With