Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't find nan entries using numpy in array of strings

Can't find nan entries using numpy in array of strings my code is:

for x in X_cat:
    if x == np.nan:
        print('Found')

I know for a fact there are 2 nan entries inn the list but the code runs without printing anything. same if I replace np.nan with 'nan' My final objective is to replace the nan with the most common string.

like image 581
Peter Lynch Avatar asked Sep 05 '17 13:09

Peter Lynch


People also ask

How do you find the NaN value of a NumPy array?

To check for NaN values in a Numpy array you can use the np. isnan() method. This outputs a boolean mask of the size that of the original array. The output array has true for the indices which are NaNs in the original array and false for the rest.

How do I find the NaN of a string in Python?

Using math. The math. isnan() is a built-in Python method that checks whether a value is NaN (Not a Number) or not. The isnan() method returns True if the specified value is a NaN. Otherwise, it returns False.

How do you filter out NaN NumPy?

To remove rows and columns containing missing values NaN in NumPy array numpy. ndarray , check NaN with np. isnan() and extract rows and columns that do not contain NaN with any() or all() .

Does NumPy support NaN?

No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.


4 Answers

That's because comparing anything with NaN, including NaN, is False. So even when x is np.nan, the print will not run. (In fact that used to be an acceptable way of checking if something was NaN as no other IEEE754 floating point value has that property.)

Use np.isnan(x) to check if x is NaN.

like image 195
Bathsheba Avatar answered Oct 08 '22 18:10

Bathsheba


In an array of strings, you can only perform string comparisons. You have to initialize a nan in a string format.

nan_str = str_np.array([np.nan]).astype(str)[0]

And by initializing an array like you describe it :

x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)

You can then replace these nan by the most common string that I assume to be mostcommonstring :

x[np.where(x.astype(str)==str_nan)]='mostcommonstring'
like image 41
Thibaut Loiseleur Avatar answered Oct 08 '22 16:10

Thibaut Loiseleur


You need to check x for NaN with np.isnan:

for x in X_cat:
    if np.isnan(x):
        print('Found')

np.nan == np.nan returns False, so direct comparison is meaningless here. Find more about isnan in numpy docs

like image 1
Oleh Rybalchenko Avatar answered Oct 08 '22 16:10

Oleh Rybalchenko


Not enough reputation to comment on Thibaut's answer, but to simplify it: The nan-string can be np.str_(np.nan) or even str(np.nan).

x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)

x[np.where(x.astype(str)==str(np.nan))] = 'mostcommonstring'

like image 1
thomaskolasa Avatar answered Oct 08 '22 18:10

thomaskolasa