I have a dataframe as given below.
import pandas as pd
raw_data = {'score': [1,2,3], 
        'tags': [['apple','pear','guava'],['truck','car','plane'],['cat','dog','mouse']]}
df = pd.DataFrame(raw_data, columns = ['score', 'tags'])
df.query("score==1") gives the first row as result.
But df.query("tags='apple'") gives error. 
How to write query for the column 'tags'.
You cannot use pd.DataFrame.query to test membership of a string in lists within a series of lists. Holding lists in Pandas dataframes is not recommended as you lose vectorised functionality.
With your existing dataframe, you can instead calculate a mask using pd.Series.apply:
res = df[df['tags'].apply(lambda x: 'apple' in x)]
print(res)
   score                  tags
0      1  [apple, pear, guava]
Or you can use a list comprehension:
res = df[['apple' in x for x in df['tags']]]
A third option is to use set:
res = df[df['tags'].apply(set) >= {'apple'}]
The last option, although expensive, may suit when you are testing for existence of multiple tags. In each case, we are constructing a Boolean series, which we then use to mask the dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With