Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: boolean indexing with 'item in list' syntax

Tags:

python

pandas

Say I have a DataFrame with a column called col1. If I want to get all rows where col1 == ‘a’, I can do that with:

df[df.col1 == ‘a’]

If I want rows where col1 is ‘a’ or ‘b’, I can do:

df[(df.col1 == ‘a’) | (df.col1 == ‘b’)]

But I’d really like to do the is something (syntactically illegal) like this:

df[df.col1 in [‘a’, ‘b’, ‘c’]]

Is there a proper pandas way to do that?

Here’s what I’m using instead:

sort_func = lambda x: x in [‘a’, ‘b’, ‘c’]
mask = df[‘col1’].apply(sort_func)
df[mask]

But… is there a better way to do this? This is bothering me.

like image 683
J Jones Avatar asked Oct 26 '15 17:10

J Jones


People also ask

How to use Boolean indexing in pandas?

Boolean Indexing in Pandas 1 Create a dictionary of data. 2 Convert it into a DataFrame object with a boolean index as a vector. 3 Now, access the data using boolean indexing. More ...

What is indexing in pandas Dataframe?

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

What are the different ways to index data in Python?

Collectively, they are called the indexers. These are by far the most common ways to index data. These are four function which help in getting the elements, rows, and columns from a DataFrame. Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections.

How to get element from a Dataframe in pandas?

There are a lot of ways to pull the elements, rows, and columns from a DataFrame. There are some indexing method in Pandas which help in getting an element from a DataFrame. These indexing methods appear very similar but behave very differently. Pandas support four types of Multi-axes indexing they are: Dataframe.


1 Answers

Use isin() for filtering

In [212]: df = pd.DataFrame([['a', 1], ['b', 2], ['c', 3], ['d', 4]],
                            columns=['col1', 'col2'])


In [213]: df['col1'].isin(['a', 'b', 'c'])
Out[213]:
0     True
1     True
2     True
3    False
Name: col1, dtype: bool

In [214]: df.loc[df['col1'].isin(['a', 'b', 'c']), :]
Out[214]:
  col1  col2
0    a     1
1    b     2
2    c     3
like image 182
Zero Avatar answered Oct 18 '22 16:10

Zero