Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting unique couples with nan

Want I want to achieve

I wish to get unique rows in a 2d numpy array containing nan.

More generally I would like to obtain unique values according to an axis in a n-d numpy.ndarray.

A reproducible example

import numpy as np
example = np.array([[0, np.nan], 
                    [np.nan, 1], 
                    [0, np.nan], 
                    [np.nan, np.nan], 
                    [np.nan, 1], 
                    [np.nan, np.nan]])

What I wish as a result it:

array([[ 0., nan],
       [nan,  1.],
       [nan, nan]])

What I have try

I have tried using np.unique but it won't work:

np.unique(example, axis=0)

Result is:

array([[ 0., nan],
       [ 0., nan],
       [nan,  1.],
       [nan,  1.],
       [nan, nan],
       [nan, nan]])

So I have discovered that np.nan == np.nan is False ... :/

I have thought of using np.allclose which as an equal_nan option. But re-implementing unique will not be efficient

NB: I want to use it in a large scale way. So it should be fast.

Does any function exist? Have I to code it? Any advice would be helpful.

like image 431
Emmanuel-Lin Avatar asked Sep 14 '25 19:09

Emmanuel-Lin


1 Answers

Replace nan with any value that is certainly not in the data, and np.unique will just work:

import numpy as np
example = np.array([[0, np.nan], 
                    [np.nan, 1], 
                    [0, np.nan], 
                    [np.nan, np.nan], 
                    [np.nan, 1], 
                    [np.nan, np.nan]])

# substitute nan with inf
example[np.isnan(example)] = np.inf

u = np.unique(example, axis=0)

# substitute inf with nan
u[u == np.inf] = np.nan

print(u)
# [[  0.  nan]
#  [ nan   1.]
#  [ nan  nan]]

In the example I used inf but any other value is fine. Just make sure it cannot occur in the data.

like image 178
MB-F Avatar answered Sep 16 '25 08:09

MB-F