Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find the unique non nan values in a numpy array?

Tags:

python

nan

numpy

I would like to know if there is a clean way to handle nan in numpy.

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[  5.   4.   2.   2.   4.  nan  nan   6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])

I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.

Thanks

like image 642
user2015487 Avatar asked Mar 09 '15 11:03

user2015487


People also ask

How do I find unique elements in a numpy array in Python?

So for finding unique elements from the array we are using numpy. unique() function of NumPy library. Return: Return the unique of an array.

What does unique () do in Python?

The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.


2 Answers

You can use np.unique to find unique values in combination with isnan to filter the NaN values:

In [22]:

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2.,  4.,  5.,  6.])

as to why you get multiple NaN values it's because NaN values cannot be compared normally:

In [23]:

np.nan == np.nan
Out[23]:
False

so you have to use isnan to perform the correct comparison

using set:

In [24]:

set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}

You can call len on any of the above to get a size:

In [26]:

len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4
like image 148
EdChum Avatar answered Sep 20 '22 07:09

EdChum


I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.

import numpy as np
import pandas as pd

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])

np.unique(my_array1)
# array([ 2.,  4.,  5.,  6., nan, nan])

pd.unique(my_array1)
# array([ 5.,  4.,  2., nan,  6.]) 

I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!

like image 34
Alex Avatar answered Sep 22 '22 07:09

Alex