Matching words with NLTK's chunk parser

Question

NLTK's chunk parser's regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to chunk any structure with a noun followed by the verb "left" (call this pattern L). For example, the sentence "the\DT dog\NN left\VB" should be chunked as
(S (DT the) (L (NN dog) (VB left))), but the sentence "the\DT dog\NN slept\VB" wouldn't be chunked at all.

I haven't been able to find any documentation on the chunking regex syntax, and all examples I've seen only match POS tags.

Spaceghost · Accepted Answer

I had a similar problem and after realizing that the regex pattern will only examine tags, I changed the tag on the the piece I was interested in.

For example, I was trying to match product name and version and using a chunk rule like \NNP+\CD worked for "Internet Explorer 8.0" but failed on "Internet Explorer 8.0 SP2" where it tagged SP2 as a NNP.

Perhaps I could have trained a POS tagger but decided instead to just change the tag to SP and then a chunk rule like \NNP+\CD\SP* will match either example.

Pratyush · Answer

The easiest way is to convert the tags of the words. Modify the tag of the word you want to use in the regular expression.

Example:

import nltk

pos_tags = nltk.pos_tag(nltk.word_tokenize('Dog slept all night. Dog left at 8pm.'))

# modify tags for the words we want to use in regular expression
pos_tags = [
    (w, 'LEFT') if w == 'left' else (w, t)
    for w, t in pos_tags
]

grammar = "CHUNK: {<NN.*> <LEFT>}"
tree = nltk.RegexpParser(grammar).parse(pos_tags)

Matching words with NLTK's chunk parser

Tags:

python

nltk

CromTheDestroyer

2 Answers

Spaceghost

Pratyush

Recent Activity

Donate For Us

Matching words with NLTK's chunk parser

Tags:

python

nltk

CromTheDestroyer

2 Answers

Spaceghost

Pratyush

Related questions

Recent Activity

Donate For Us