Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper way to use "opposite boolean" in Pandas data frame boolean indexing

I wanted to use a boolean indexing, checking for rows of my data frame where a particular column does not have NaN values. So, I did the following:

import pandas as pd
my_df.loc[pd.isnull(my_df['col_of_interest']) == False].head()

to see a snippet of that data frame, including only the values that are not NaN (most values are NaN).

It worked, but seems less-than-elegant. I'd want to type:

my_df.loc[!pd.isnull(my_df['col_of_interest'])].head()

However, that generated an error. I also spend a lot of time in R, so maybe I'm confusing things. In Python, I usually put in the syntax "not" where I can. For instance, if x is not none:, but I couldn't really do it here. Is there a more elegant way? I don't like having to put in a senseless comparison.

like image 528
Mike Williamson Avatar asked Nov 04 '15 01:11

Mike Williamson


People also ask

Is boolean indexing possible in DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

What are the two ways of indexing DataFrame?

loc method is used for label based indexing. . iloc method is used for position based indexing.

How do you use a boolean in a data frame?

Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.


1 Answers

In general with pandas (and numpy), we use the bitwise NOT ~ instead of ! or not (whose behaviour can't be overridden by types).

While in this case we have notnull, ~ can come in handy in situations where there's no special opposite method.

>>> df = pd.DataFrame({"a": [1, 2, np.nan, 3]})
>>> df.a.isnull()
0    False
1    False
2     True
3    False
Name: a, dtype: bool
>>> ~df.a.isnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool
>>> df.a.notnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool

(For completeness I'll note that -, the unary negative operator, will also work on a boolean Series but ~ is the canonical choice, and - has been deprecated for numpy boolean arrays.)

like image 76
DSM Avatar answered Sep 18 '22 14:09

DSM