Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: get rows of a DataFrame where a column is not null

I'm filtering my DataFrame dropping those rows in which the cell value of a specific column is None.

df = df[df['my_col'].isnull() == False]

Works fine, but PyCharm tells me:

PEP8: comparison to False should be 'if cond is False:' or 'if not cond:'

But I wonder how I should apply this to my use-case? Using 'not ...' or ' is False' did not work. My current solution is:

df = df[df['my_col'].notnull()]
like image 957
Matthias Avatar asked Apr 05 '18 13:04

Matthias


People also ask

How do you select rows without NULL values in Python?

To display not null rows and columns in a python data frame we are going to use different methods as dropna(), notnull(), loc[]. dropna() : This function is used to remove rows and column which has missing values that are NaN values.

How do you get NOT NULL columns in pandas?

Pandas DataFrame notnull() Method The notnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NOT NULL values, and otherwise False.

IS NOT NULL LOC pandas?

notnull is a general function of the pandas library in Python that detects if values are not missing for either a single value (scalar) or array-like objects. The function returns booleans to reflect whether the values evaluated are null (False) or not null (True). . notnull is an alias of the pandas .


1 Answers

So python has the short-circuiting logic operators not, and, or. These have a very specific meaning in python and cannot be overridden (not must return a bool and a and/or b always returns either a or b or throws an error.

However, python also has over-loadable boolean operators ~ (not), & (and), | (or) and ^ (xor).

You may recognise these as the int bitwise operators, but Numpy (and therefore pandas) use these to do array / series boolean operations.

For example

b = np.array([True, False, True]) & np.array([True, False, False])
# b --> [True False False]
b = ~b 
# b --> [False True True]

Hence what you want is

df = df[~df['my_col'].isnull()]

I agree with PEP8, don't do == False.

like image 166
FHTMitchell Avatar answered Nov 07 '22 12:11

FHTMitchell