Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The difference between comparison to np.nan and isnull()

Tags:

I supposed that

data[data.agefm.isnull()] 

and

data[data.agefm == numpy.nan] 

are equivalent. But no, the first truly returns rows where agefm is NaN, but the second returns an empty DataFrame. I thank that omitted values are always equal to np.nan, but it seems wrong.

agefm column has float64 type:

(Pdb) data.agefm.describe() count    2079.000000 mean       20.686388 std         5.002383 min        10.000000 25%        17.000000 50%        20.000000 75%        23.000000 max        46.000000 Name: agefm, dtype: float64 

Could you explain me please, what does data[data.agefm == np.nan] mean exactly?

like image 464
sergzach Avatar asked Dec 27 '16 09:12

sergzach


People also ask

What is the difference between Null and NaN?

Javascript null represents the intentional absence of any object value. The undefined property indicates that the variable has not been assigned a value or not declared at all. The NaN property represents a “Not-a-Number” value. The NaN property indicates that a value is not a legitimate number.

How do you compare NP and NaN?

To check for NaN values in a Numpy array you can use the np. isnan() method. This outputs a boolean mask of the size that of the original array. The output array has true for the indices which are NaNs in the original array and false for the rest.

What is the difference between NaN and NaN in Python?

nan is a single object that always has the same id, no matter which variable you assign it to. np. nan is np. nan is True and one is two is also True .

What is the difference between NP NaN and None?

np.nan allows for vectorized operations; its a float value, while None , by definition, forces object type, which basically disables all efficiency in numpy.


1 Answers

np.nan is not comparable to np.nan... directly.

np.nan == np.nan  False 

While

np.isnan(np.nan)  True 

Could also do

pd.isnull(np.nan)  True 

examples
Filters nothing because nothing is equal to np.nan

s = pd.Series([1., np.nan, 2.]) s[s != np.nan]  0    1.0 1    NaN 2    2.0 dtype: float64 

Filters out the null

s = pd.Series([1., np.nan, 2.]) s[s.notnull()]  0    1.0 2    2.0 dtype: float64 

Use odd comparison behavior to get what we want anyway. If np.nan != np.nan is True then

s = pd.Series([1., np.nan, 2.]) s[s == s]  0    1.0 2    2.0 dtype: float64 

Just dropna

s = pd.Series([1., np.nan, 2.]) s.dropna()  0    1.0 2    2.0 dtype: float64 
like image 88
piRSquared Avatar answered Sep 24 '22 19:09

piRSquared