Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract grocery list out of free text

Tags:

python

nlp

nltk

I am looking for a python library / algorithm / paper to extract a list of groceries out of free text.

For example:

"One salad and two beers"

Should be converted to:

{'salad':1, 'beer': 2}
like image 590
Uri Goren Avatar asked Jul 17 '16 08:07

Uri Goren


2 Answers

In [1]: from word2number import w2n
In [2]: print w2n.word_to_num("One")
1
In [3]: print w2n.word_to_num("Two")
2
In [4]: print w2n.word_to_num("Thirty five")
35

You can convert to number with using this package and rest of things you can implement as your needs.

Installation of this package.

pip install word2number

Update

You can implement like this.

from word2number import w2n
result = {}
input = "One salad and two beers"
b = input.split()
for i in b:
    if type(w2n.word_to_num(i)) is int:
        result[b[b.index(i)+1]] = w2n.word_to_num(i)

Result

{'beers': 2, 'salad': 1}

like image 67
Rahul K P Avatar answered Oct 23 '22 03:10

Rahul K P


I suggest using WordNet. You can call it from java (JWNL library), etc. Here is the suggestion: for each single word, check it's hypernym. For edibles at the top level of the hypernymy hierarchy you will find " food, nutrient". Which is probably what you want. Now to test this, query the word "beer" in the Online version. Click on the "S", and then click on "inherited hypernym ". You will find this somewhere in the hierarchy:

....
    S: (n) beverage, drink, drinkable, potable (any liquid suitable for drinking) "may I take your beverage order?"
        S: (n) food, nutrient (any substance that can be metabolized by an animal to give energy and build tissue) 
          ....

You can traverse this hierarchy using the programming language of your choice, etc. Once you flagged all the edibles, then you can catch the number , i.e. 2 in "2 beers", and you have all the information you need. Note that catching the numbers by itself can be a descent coding task! Hope it helps!

like image 32
user3639557 Avatar answered Oct 23 '22 05:10

user3639557