NLTK Thinks that Imperatives are Nouns

Tags:

nltk

I'm using the pos_tagger on recipes. A problem I'm having is that the pos_tagger returns that words in the imperative tense are nouns, shouldn't they be verbs? For example:

With the input:

combine 1 1/2 cups floud, 3/4 cup sugar, salt and baking powder

The output is:

[('combine', 'NN'), ('1', 'CD'), ('1/2', 'CD'), ('cups', 'NNS'), ('floud', 'VBD'), (',',      ','), ('3/4', 'CD'), ('cup', 'NN'), ('sugar', 'NN'), (',', ','), ('salt', 'NN'), ('and', 'CC'), ('baking', 'VBG'), ('powder', 'NN')]

Here's the code I'm using for this:

    def part_of_speech(self,input_sentance):
        text = nltk.word_tokenize(input_sentance)
        return nltk.pos_tag(text)

Shouldn't 'combine' be tagged as some sort of verb? Is this the fault of the nltk? Or am I doing something wrong?

799

asked Feb 23 '12 02:02

2 Answers

What you're seeing is a very common problem in traditional statistical natural language processing (NLP). In short, the data you are using the tagger on doesn't look like the data it was trained on. NLTK doesn't document the details, but as far as I know the default tagger is trained on Wall Street Journal articles, the Brown Corpus, or some combination of the two. These corpora contain very few imperatives, so when you give it data with imperatives it doesn't do the right thing.

A good long-term solution would be to correct the tags for a large corpus of recipes and train on the corrected data, that way you solve the problem of mismatch between the training and testing data. This is, however, a huge amount of work. Ideally, a corpus with a lot of imperatives would already exist; my research group has looked into this and we have not found a suitable one, although we are in the process of producing one.

A much simpler solution that I've been using on a recent project that required imperatives to be understood correctly is to simply note what the imperatives are that you want, and force the tags for those words to be correct.

So in the example below, I made a dictionary saying that "combine" should be treated as a verb, and then used a list comprehension to change the tags.

tagged_words = [('combine', 'NN'), ('1', 'CD'), ('1/2', 'CD'), ('cups', 'NNS'), ('flour', 'VBD')]
force_tags = {'combine': 'VB'}
new_tagged_words = [(word, force_tags.get(word, tag)) for word, tag in tagged_words]

The contents of new_tagged_words now has the original tags except changed wherever there was an entry in force_tags.

>>> new_tagged_words
[('combine', 'VB'), ('1', 'CD'), ('1/2', 'CD'), ('cups', 'NNS'), ('flour', 'VBD')]

This solution does require you to say what the words you want to force to verbs are. This is far from ideal, but there isn't a better general solution.

194

answered Oct 26 '22 04:10

Constantine

Training on imperative corpora would be the best option. But if you don't have the time or don't think the effort is worth it, here is a simple solution (more of a hack): Just put a pronoun like 'they' before every sentence (which you are sure is imperative). Now nltk does a fine job with the default tagger.

answered Oct 26 '22 06:10

iceman_w

Related questions
                            
                                Namespaces in C# vs imports in Java and Python
                            
                                right-to-left languages in Python
                            
                                Make the readline method of Python recognize both end-of-line variations?
                            
                                QTreeView with drag and drop support in PyQt
                            
                                html to .doc converter in Python?
                            
                                Where can I find a good online Python course? [closed]
                            
                                How is __slots__ implemented in Python?
                            
                                Django: Display a custom error message for admin validation error
                            
                                Boost python linking
                            
                                How do I convert a datetime.date object into a time.struct_time object?
                            
                                Is there a meaningful way to use context managers inside generators?
                            
                                Generate Database Schema using Python
                            
                                Insert a tzinfo into datetime
                            
                                Filtering on Foreign Keys in Django
                            
                                assign values to symbols in python debugger (pdb)
                            
                                Passing a C pointer around with the Python/C API
                            
                                ValidationError in Django
                            
                                Node.js vs Python [closed]
                            
                                Decorator for overloading in Python
                            
                                Change matplotlib line style mid-graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NLTK Thinks that Imperatives are Nouns

Tags:

python

nltk

mdogg

People also ask

2 Answers

Constantine

iceman_w

Recent Activity

Donate For Us