I would like to know if there is a clean way to handle nan in numpy.
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[ 5. 4. 2. 2. 4. nan nan 6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])
I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.
Thanks
So for finding unique elements from the array we are using numpy. unique() function of NumPy library. Return: Return the unique of an array.
The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.
You can use np.unique to find unique values in combination with isnan to filter the NaN values:
In [22]:
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2., 4., 5., 6.])
as to why you get multiple NaN values it's because NaN values cannot be compared normally:
In [23]:
np.nan == np.nan
Out[23]:
False
so you have to use isnan to perform the correct comparison
using set:
In [24]:
set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}
You can call len on any of the above to get a size:
In [26]:
len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4
I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.
import numpy as np
import pandas as pd
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1)
# array([ 2., 4., 5., 6., nan, nan])
pd.unique(my_array1)
# array([ 5., 4., 2., nan, 6.])
I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With