Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify certain words in pandas columns

Tags:

python

pandas

I have a tsv file as follows.

id    ingredients    recipe
code1  egg, butter   beat eggs. add unsalted butter
code2  tim tam, butter  beat tim tam. add butter
code3  coffee, sugar   add coffee and sugar and mix
code4  sugar, fresh goat milk   beat sugar and milk together

I want to remove the entries if they contain the below mentioned words in either ingredients or recipe column.

mylist = ['tim tam', 'unsalted butter', 'fresh goat milk']

My output should look as follows.

id    ingredients    recipe
code3  coffee, sugar   add coffee and sugar and mix

Is there a way to do this using pandas? Please help me!


1 Answers

Use contains with join to look to see if string contains a "sub" string, and join base with '|' to make a regex:

mylist = ['tim tam','unsalted butter','fresh goat milk']
df[~(df.ingredients.str.contains('|'.join(mylist)) | 
     df.recipe.str.contains('|'.join(mylist)))]

Output:

     id    ingredients                        recipe
2  code3  coffee, sugar  add coffee and sugar and mix
like image 184
Scott Boston Avatar answered Nov 28 '25 07:11

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!