Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter NaN values in a dataframe column

y = data.loc[data['column1'] != float('NaN'),'column1']

The code above is still returning rows with NaN values in 'column1'. Not sure what I'm doing wrong.. Please help!

like image 384
pilz2985 Avatar asked Sep 15 '25 12:09

pilz2985


1 Answers

NaN, by definition is not equal to NaN.

In [1262]: np.nan == np.nan
Out[1262]: False

Read up about the mathematical concept on Wikipedia.


Option 1

Using pd.Series.notnull:

df

   column1
0      1.0
1      2.0
2    345.0
3      NaN
4      4.0
5     10.0
6      NaN
7    100.0
8      NaN

y = df.loc[df.column1.notnull(), 'column1']
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64

Option 2

As MSeifert suggested, you could use np.isnan:

y = df.loc[~np.isnan(df.column1), 'column1']
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64

Option 3

If it's just the one column, call pd.Series.dropna:

y = df.column1.dropna()
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64
like image 54
cs95 Avatar answered Sep 18 '25 10:09

cs95