Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the default chunker for NLTK toolkit in Python?

I am using their default POS tagging and default tokenization..and it seems sufficient. I'd like their default chunker too.

I am reading the NLTK toolkit book, but it does not seem like they have a default chunker?

like image 297
TIMEX Avatar asked Nov 06 '09 13:11

TIMEX


2 Answers

You can get out of the box named entity chunking with the nltk.ne_chunk() method. It takes a list of POS tagged tuples:

nltk.ne_chunk([('Barack', 'NNP'), ('Obama', 'NNP'), ('lives', 'NNS'), ('in', 'IN'), ('Washington', 'NNP')])

results in:

Tree('S', [Tree('PERSON', [('Barack', 'NNP')]), Tree('ORGANIZATION', [('Obama', 'NNP')]), ('lives', 'NNS'), ('in', 'IN'), Tree('GPE', [('Washington', 'NNP')])])

It identifies Barack as a person, but Obama as an organization. So, not perfect.

like image 120
ealdent Avatar answered Oct 11 '22 06:10

ealdent


I couldn't find a default chunker/shallow parser either. Although the book describes how to build and train one with example features. Coming up with additional features to get good performance shouldn't be too difficult.

See Chapter 7's section on Training Classifier-based Chunkers.

like image 39
James Clarke Avatar answered Oct 11 '22 07:10

James Clarke