Identify certain words in pandas columns

Question

I have a tsv file as follows.

id    ingredients    recipe
code1  egg, butter   beat eggs. add unsalted butter
code2  tim tam, butter  beat tim tam. add butter
code3  coffee, sugar   add coffee and sugar and mix
code4  sugar, fresh goat milk   beat sugar and milk together

I want to remove the entries if they contain the below mentioned words in either ingredients or recipe column.

mylist = ['tim tam', 'unsalted butter', 'fresh goat milk']

My output should look as follows.

id    ingredients    recipe
code3  coffee, sugar   add coffee and sugar and mix

Is there a way to do this using pandas? Please help me!

Scott Boston · Accepted Answer

Use contains with join to look to see if string contains a "sub" string, and join base with '|' to make a regex:

mylist = ['tim tam','unsalted butter','fresh goat milk']
df[~(df.ingredients.str.contains('|'.join(mylist)) | 
     df.recipe.str.contains('|'.join(mylist)))]

Output:

     id    ingredients                        recipe
2  code3  coffee, sugar  add coffee and sugar and mix

Identify certain words in pandas columns

Tags:

python

pandas

1 Answers

Scott Boston

Recent Activity

Donate For Us

Identify certain words in pandas columns

Tags:

python

pandas

1 Answers

Scott Boston

Related questions

Recent Activity

Donate For Us