Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Right Syntax When Using .notnull() in Pandas?

I want to use .notnull() on several columns of a dataframe to eliminate the rows which contain "NaN" values.

Let say I have the following df:

  A   B   C
0 1   1   1
1 1   NaN 1
2 1   NaN NaN
3 NaN 1   1

I tried to use this syntax but it does not work? do you know what I am doing wrong?

df[[df.A.notnull()],[df.B.notnull()],[df.C.notnull()]]

I get this Error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

What should I do to get the following output?

  A   B   C
0 1   1   1

Any idea?

like image 253
MEhsan Avatar asked Aug 01 '16 15:08

MEhsan


People also ask

What does Notnull do in Python?

Detect non-missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are valid (not missing, which is NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

What will be correct syntax for pandas?

series(data,index,dtype,copy) is syntax for pandas series.

What is data Isnull () SUM ()?

The function dataframe. isnull(). sum(). sum() returns the number of missing values in the data set.

What is Isnull () in pandas?

The isnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NULL values, and otherwise False.


3 Answers

You can simply do:

df.dropna()
like image 137
Sudhin Joseph Avatar answered Oct 25 '22 03:10

Sudhin Joseph


You can first select subset of columns by df[['A','B','C']], then apply notnull and specify if all values in mask are True:

print (df[['A','B','C']].notnull())
       A      B      C
0   True   True   True
1   True  False   True
2   True  False  False
3  False   True   True

print (df[['A','B','C']].notnull().all(1))
0     True
1    False
2    False
3    False
dtype: bool

print (df[df[['A','B','C']].notnull().all(1)])
     A    B    C
0  1.0  1.0  1.0

Another solution is from Ayhan comment with dropna:

print (df.dropna(subset=['A', 'B', 'C']))
     A    B    C
0  1.0  1.0  1.0

what is same as:

print (df.dropna(subset=['A', 'B', 'C'], how='any'))

and means drop all rows, where is at least one NaN value.

like image 40
jezrael Avatar answered Oct 25 '22 04:10

jezrael


You can apply multiple conditions by combining them with the & operator (this works not only for the notnull() function).

df[(df.A.notnull() & df.B.notnull() & df.C.notnull())]
     A    B    C
0  1.0  1.0  1.0

Alternatively, you can just drop all columns which contain NaN. The original DataFrame is not modified, instead a copy is returned.

df.dropna()

like image 36
Jan Trienes Avatar answered Oct 25 '22 04:10

Jan Trienes