Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter out nan rows in a specific column

df =  Col1 Col2 Col3 1    nan  4 2    5    4 3    3    nan 

Given the dataframe df, I want to obtain a new dataframe df2 that does not contain nan in the column Col2. This is the expected result: df2 =

Col1 Col2 Col3 2    5    4 3    3    nan 

I know that it's possible to use pandas.isnull and dropna, however how to specify only particular column to which filtering should be applied?

like image 812
Dinosaurius Avatar asked May 06 '17 14:05

Dinosaurius


People also ask

How do you filter out rows based on column values?

DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

How do I remove NaN values from a column?

DataFrame. dropna() is used to drop/remove columns with NaN / None values. Python doesn't support Null hence any missing data is represented as None or NaN values. NaN stands for Not A Number and is one of the common ways to represent the missing values in the data.

How do you filter Null values in a DataFrame?

In Spark, using filter() or where() functions of DataFrame we can filter rows with NULL values by checking IS NULL or isNULL . These removes all rows with null values on state column and returns the new DataFrame. All above examples returns the same output.


2 Answers

you can use DataFrame.dropna() method:

In [202]: df.dropna(subset=['Col2']) Out[202]:    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 

or (in this case) less idiomatic Series.notnull():

In [204]: df.loc[df.Col2.notnull()] Out[204]:    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 

or using DataFrame.query() method:

In [205]: df.query("Col2 == Col2") Out[205]:    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 

numexpr solution:

In [241]: import numexpr as ne  In [242]: col = df.Col2  In [243]: df[ne.evaluate("col == col")] Out[243]:    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 
like image 140
MaxU - stop WAR against UA Avatar answered Sep 27 '22 22:09

MaxU - stop WAR against UA


Use dropna:

df = df.dropna(subset=['Col2']) print (df)   Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 

Another solution - boolean indexing with notnull:

df = df[df['Col2'].notnull()] print (df)    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 

What is same as:

df = df[~df['Col2'].isnull()] print (df)    Col1  Col2  Col3 1     2   5.0   4.0 2     3   3.0   NaN 
like image 27
jezrael Avatar answered Sep 27 '22 23:09

jezrael