Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use NLP to parse recipe ingredients?

Tags:

parsing

nlp

I need to parse recipe ingredients into amount, measurement, item, and description as applicable to the line, such as 1 cup flour, the peel of 2 lemons and 1 cup packed brown sugar etc. What would be the best way of doing this? I am interested in using python for the project so I am assuming using the nltk is the best bet but I am open to other languages.

like image 984
Greg Avatar asked Oct 15 '08 03:10

Greg


4 Answers

I actually do this for my website, which is now part of an open source project for others to use.

I wrote a blog post on my techniques, enjoy!

http://blog.kitchenpc.com/2011/07/06/chef-watson/

like image 86
Mike Christensen Avatar answered Nov 19 '22 22:11

Mike Christensen


The New York Times faced this problem when they were parsing their recipe archive. They used an NLP technique called linear-chain condition random field (CRF). This blog post provides a good overview:

  • "Extracting Structured Data From Recipes Using Conditional Random Fields"

They open-sourced their code, but quickly abandoned it. I maintain the most up-to-date version of it and I wrote a bit about how I modernized it.

If you're looking for a ready-made solution, several companies offer ingredient parsing as a service:

  • Zestful (full disclosure: I'm the author)
  • Spoonacular
  • Edamam
like image 20
mtlynch Avatar answered Nov 19 '22 21:11

mtlynch


I guess this is a few years out, but I was thinking of doing something similar myself and came across this, so thought I might have a stab at it in case it is useful to anyone else in f

Even though you say you want to parse free test, most recipes have a pretty standard format for their recipe lists: each ingredient is on a separate line, exact sentence structure is rarely all that important. The range of vocab is relatively small as well.

One way might be to check each line for words which might be nouns and words/symbols which express quantities. I think WordNet may help with seeing if a word is likely to be a noun or not, but I've not used it before myself. Alternatively, you could use http://en.wikibooks.org/wiki/Cookbook:Ingredients as a word list, though again, I wouldn't know exactly how comprehensive it is.

The other part is to recognise quantities. These come in a few different forms, but few enough that you could probably create a list of keywords. In particular, make sure you have good error reporting. If the program can't fully parse a line, get it to report back to you what that line is, along with what it has/hasn't recognised so you can adjust your keyword lists accordingly.

Aaanyway, I'm not guaranteeing any of this will work (and it's almost certain not to be 100% reliable) but that's how I'd start to approach the problem

like image 5
BigglesB Avatar answered Nov 19 '22 22:11

BigglesB


This is an incomplete answer, but you're looking at writing up a free-text parser, which as you know, is non-trivial :)

Some ways to cheat, using knowledge specific to cooking:

  1. Construct lists of words for the "adjectives" and "verbs", and filter against them
    1. measurement units form a closed set, using words and abbreviations like {L., c, cup, t, dash}
    2. instructions -- cut, dice, cook, peel. Things that come after this are almost certain to be ingredients
  2. Remember that you're mostly looking for nouns, and you can take a labeled list of non-nouns (from WordNet, for example) and filter against them.

If you're more ambitious, you can look in the NLTK Book at the chapter on parsers.

Good luck! This sounds like a mostly doable project!

like image 3
Gregg Lind Avatar answered Nov 19 '22 22:11

Gregg Lind