I have a dataframe as given below.
import pandas as pd
raw_data = {'score': [1,2,3],
'tags': [['apple','pear','guava'],['truck','car','plane'],['cat','dog','mouse']]}
df = pd.DataFrame(raw_data, columns = ['score', 'tags'])
df.query("score==1")
gives the first row as result.
But df.query("tags='apple'")
gives error.
How to write query for the column 'tags'.
You cannot use pd.DataFrame.query
to test membership of a string in lists within a series of lists. Holding lists in Pandas dataframes is not recommended as you lose vectorised functionality.
With your existing dataframe, you can instead calculate a mask using pd.Series.apply
:
res = df[df['tags'].apply(lambda x: 'apple' in x)]
print(res)
score tags
0 1 [apple, pear, guava]
Or you can use a list comprehension:
res = df[['apple' in x for x in df['tags']]]
A third option is to use set
:
res = df[df['tags'].apply(set) >= {'apple'}]
The last option, although expensive, may suit when you are testing for existence of multiple tags. In each case, we are constructing a Boolean series, which we then use to mask the dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With