Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()

Tags:

python

pandas

This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):

import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)

But in all cases, all I get is this:

   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5
And now
   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5

Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:

   A    B    C
0  1    1    1
1  2    2    2
4  5    5    5
like image 593
durbachit Avatar asked Sep 06 '16 02:09

durbachit


People also ask

What does Dropna () do in Python?

The dropna() method removes the rows that contains NULL values. The dropna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the dropna() method does the removing in the original DataFrame instead.

Does Dropna remove rows?

dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.

How do I drop missing values in Pandas?

The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values.


2 Answers

Try this on orig data:

Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test

Or Modify data and do this

import pandas as pd
import numpy as np 

Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)



   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5
like image 187
Merlin Avatar answered Sep 30 '22 21:09

Merlin


Your example DF has NaN and NaT as strings which .dropna, .notnull and co. won't consider falsey, so given your example you can use...

df[~df.isin(['NaN', 'NaT']).any(axis=1)]

Which gives you:

   A  B  C
0  1  1  1
1  2  2  2
4  5  5  5

If you had a DF such as (note of the use of np.nan and np.datetime64('NaT') instead of strings:

df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})

Then running df.dropna() which give you:

   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

Note that column B is now a float instead of an integer as that's required to store NaN values.

like image 31
Jon Clements Avatar answered Sep 30 '22 23:09

Jon Clements