Filter pandas DataFrame by membership in set-of-tags

Let's say that I have a DataFrame containing a list or set of tags and I want to filter the DataFrame based on whether a certain tag is part of this row, what is the most idiomatic way to achieve this with pandas?

import pandas as pd

df = pd.DataFrame({
    'amount': [15, 20, 40],
    'tags': [["Food", "Eating Out"], ["Food", "Groceries"], ["Clothes"]],
    'description': ["Garfunkel's", "Tesco", "Hollister"]
})

I have this piece of code that works, but is rather clunky to write:

criterion = lambda row: 'Food' in row['tags']
df[df.apply(criterion, axis=1)]

The result should be:

result

How do you filter a Pandas DataFrame based on a list of values?

DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.

How do you filter a DataFrame based on a list?

Use pandas. DataFrame. isin() to filter a DataFrame using a list.

You can apply a lambda to only the relevant column, instead of the whole row:

df[df['tags'].map(lambda tags: 'Food' in tags)]

For efficiency, searching list-of-string-tags every time you want to do logical indexing will be bad. So:

Expand df['tags'] into multiple columns. Either:

if there are at most T tags, add T boolean columns df['tFood'] = [ 'Food' in tt for tt in df['tags'] ]
if each item can have at most N tags and N is small, then add string columns tag1,tag2...tagN. In fact you can convert your string to Categoricals, no need to string-match every time.

Now, you can do logical indexing quickly:

df.loc[df['tFood']==True,]
# amount  description                tags tFood
# 0      15  Garfunkel's  [Food, Eating Out]  True
# 1      20        Tesco   [Food, Groceries]  True

Try this.Its not a perfect solution but it works.

print df[df['tags'].astype(str).str.contains('Food')]

You can even use regular expressions in contains() to match multiple patterns.

Filter pandas DataFrame by membership in set-of-tags

Tags:

python

pandas

filter

passy

People also ask

3 Answers

Marius

smci

Charan Reddy

Recent Activity

Donate For Us

Filter pandas DataFrame by membership in set-of-tags

Tags:

python

pandas

filter

passy

People also ask

3 Answers

Marius

smci

Charan Reddy

Related questions

Recent Activity

Donate For Us