Can't find nan entries using numpy in array of strings my code is:
for x in X_cat:
if x == np.nan:
print('Found')
I know for a fact there are 2 nan entries inn the list but the code runs without printing anything. same if I replace np.nan with 'nan' My final objective is to replace the nan with the most common string.
To check for NaN values in a Numpy array you can use the np. isnan() method. This outputs a boolean mask of the size that of the original array. The output array has true for the indices which are NaNs in the original array and false for the rest.
Using math. The math. isnan() is a built-in Python method that checks whether a value is NaN (Not a Number) or not. The isnan() method returns True if the specified value is a NaN. Otherwise, it returns False.
To remove rows and columns containing missing values NaN in NumPy array numpy. ndarray , check NaN with np. isnan() and extract rows and columns that do not contain NaN with any() or all() .
No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.
That's because comparing anything with NaN
, including NaN
, is False
. So even when x
is np.nan
, the print
will not run. (In fact that used to be an acceptable way of checking if something was NaN
as no other IEEE754 floating point value has that property.)
Use np.isnan(x)
to check if x
is NaN
.
In an array of strings, you can only perform string comparisons. You have to initialize a nan in a string format.
nan_str = str_np.array([np.nan]).astype(str)[0]
And by initializing an array like you describe it :
x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)
You can then replace these nan
by the most common string that I assume to be mostcommonstring
:
x[np.where(x.astype(str)==str_nan)]='mostcommonstring'
You need to check x for NaN with np.isnan:
for x in X_cat:
if np.isnan(x):
print('Found')
np.nan == np.nan
returns False
, so direct comparison is meaningless here. Find more about isnan in numpy docs
Not enough reputation to comment on Thibaut's answer, but to simplify it:
The nan-string can be np.str_(np.nan)
or even str(np.nan)
.
x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)
x[np.where(x.astype(str)==str(np.nan))] = 'mostcommonstring'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With