Want I want to achieve
I wish to get unique rows in a 2d numpy array containing nan.
More generally I would like to obtain unique values according to an axis in a n-d numpy.ndarray.
A reproducible example
import numpy as np
example = np.array([[0, np.nan],
[np.nan, 1],
[0, np.nan],
[np.nan, np.nan],
[np.nan, 1],
[np.nan, np.nan]])
What I wish as a result it:
array([[ 0., nan],
[nan, 1.],
[nan, nan]])
What I have try
I have tried using np.unique but it won't work:
np.unique(example, axis=0)
Result is:
array([[ 0., nan],
[ 0., nan],
[nan, 1.],
[nan, 1.],
[nan, nan],
[nan, nan]])
So I have discovered that np.nan == np.nan is False ... :/
I have thought of using np.allclose which as an equal_nan option. But re-implementing unique will not be efficient
NB: I want to use it in a large scale way. So it should be fast.
Does any function exist? Have I to code it? Any advice would be helpful.
Replace nan with any value that is certainly not in the data, and np.unique will just work:
import numpy as np
example = np.array([[0, np.nan],
[np.nan, 1],
[0, np.nan],
[np.nan, np.nan],
[np.nan, 1],
[np.nan, np.nan]])
# substitute nan with inf
example[np.isnan(example)] = np.inf
u = np.unique(example, axis=0)
# substitute inf with nan
u[u == np.inf] = np.nan
print(u)
# [[ 0. nan]
# [ nan 1.]
# [ nan nan]]
In the example I used inf but any other value is fine. Just make sure it cannot occur in the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With