I would like to know if there is a clean way to handle nan in numpy. <pre class="prettyprint"><code>my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6]) print my_array1 #[ 5. 4. 2. 2. 4. nan nan 6.] print set(my_array1) #set([nan, nan, 2.0, 4.0, 5.0, 6.0]) </code></pre> I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array. Thanks

You can use <code>np.unique</code> to find unique values in combination with <code>isnan</code> to filter the <code>NaN</code> values: <pre class="prettyprint"><code>In [22]: my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6]) np.unique(my_array1[~np.isnan(my_array1)]) Out[22]: array([ 2., 4., 5., 6.]) </code></pre> as to why you get multiple <code>NaN</code> values it's because <code>NaN</code> values cannot be compared normally: <pre class="prettyprint"><code>In [23]: np.nan == np.nan Out[23]: False </code></pre> so you have to use <code>isnan</code> to perform the correct comparison using <code>set</code>: <pre class="prettyprint"><code>In [24]: set(my_array1[~np.isnan(my_array1)]) Out[24]: {2.0, 4.0, 5.0, 6.0} </code></pre> You can call <code>len</code> on any of the above to get a size: <pre class="prettyprint"><code>In [26]: len(np.unique(my_array1[~np.isnan(my_array1)])) Out[26]: 4 </code></pre>

how to find the unique non nan values in a numpy array?

Tags:

python

nan

numpy

I would like to know if there is a clean way to handle nan in numpy.

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[  5.   4.   2.   2.   4.  nan  nan   6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])

I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.

Thanks

642

asked Mar 09 '15 11:03

user2015487

2 Answers

You can use np.unique to find unique values in combination with isnan to filter the NaN values:

In [22]:

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2.,  4.,  5.,  6.])

as to why you get multiple NaN values it's because NaN values cannot be compared normally:

In [23]:

np.nan == np.nan
Out[23]:
False

so you have to use isnan to perform the correct comparison

using set:

In [24]:

set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}

You can call len on any of the above to get a size:

In [26]:

len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4

148

answered Sep 20 '22 07:09

EdChum

I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.

import numpy as np
import pandas as pd

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])

np.unique(my_array1)
# array([ 2.,  4.,  5.,  6., nan, nan])

pd.unique(my_array1)
# array([ 5.,  4.,  2., nan,  6.])

I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!

answered Sep 22 '22 07:09

Alex

Related questions
                            
                                Login to website using python
                            
                                Convert numbers to grades in python list
                            
                                Python - dealing with mixed-encoding files
                            
                                Python: two-curve gaussian fitting with non-linear least-squares
                            
                                Solving Puzzle in Python
                            
                                Running command lines within your Python script
                            
                                OpenCV 2.4.1 - computing SURF descriptors in Python
                            
                                Is there a C/C++ API for python pandas? [closed]
                            
                                SQLAlchemy introspect column type with inheritance
                            
                                Apply function to pandas DataFrame that can return multiple rows
                            
                                Multiple legends in matplotlib in for loop
                            
                                Calling a function upon button press
                            
                                Pandas data frame from dictionary
                            
                                sys.stdin.readline() reads without prompt, returning 'nothing in between'
                            
                                broken easy_install and pip after upgrading to OS X Mavericks
                            
                                Get rows that have the same value across its columns in pandas
                            
                                Python: function and variable with the same name
                            
                                numpy sum of squares for matrix
                            
                                Using Selenium on Raspberry Pi headless
                            
                                make os.listdir() list complete paths

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With