NLTK named entity recognition in dutch

Tags:

I am trying to extract named entities from dutch text. I used nltk-trainer to train a tagger and a chunker on the conll2002 dutch corpus. However, the parse method from the chunker is not detecting any named entities. Here is my code:

str = 'Christiane heeft een lam.'

tagger = nltk.data.load('taggers/dutch.pickle')
chunker = nltk.data.load('chunkers/dutch.pickle')

str_tags = tagger.tag(nltk.word_tokenize(str))
print str_tags

str_chunks = chunker.parse(str_tags)
print str_chunks

And the output of this program:

[('Christiane', u'N'), ('heeft', u'V'), ('een', u'Art'), ('lam', u'Adj'), ('.', u'Punc')]
(S Christiane/N heeft/V een/Art lam/Adj ./Punc)

I was expecting Christiane to be detected as a named entity. Any help?

372

asked Jul 02 '12 11:07

user1491915

1 Answers

The conll2002 corpus has both spanish and dutch text, so you should make sure to use the fileids parameter, as in python train_chunker.py conll2002 --fileids ned.train. Training on both spanish and dutch will have poor results.

The default algorithm is a Tagger based Chunker, which does not work well on conll2002. Instead, use a classifier based chunker like NaiveBayes, so the full command might look like this (and I've confirmed that the resulting chunker does recognize "Christiane" as a "PER"):

python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

answered Oct 13 '22 01:10

Jacob

Related questions
                            
                                Python/Numpy - Save Array with Column AND Row Titles
                            
                                hash string size
                            
                                Excel RTD server in Python not updating data
                            
                                Django TestCase not using transactions on secondary database
                            
                                How should I store API keys in a Python app?
                            
                                Python subprocess output on windows?
                            
                                Create per-instance property descriptor?
                            
                                Python 2 and 3, are the bytecode (pyo & pyc) backward compatible?
                            
                                IPython: how do I pipe something into a Python script
                            
                                edit rgb values in a jpg with python
                            
                                Why is WxPythons motion detection so slow?
                            
                                Using both SQLAlchemy and Django ORM on the same database
                            
                                Python and UDP listening
                            
                                How to combine interactive prompting with argparse in python?
                            
                                What is the difference between numpy "type identifiers" and "types" within Cython?
                            
                                Vector-valued function interpolation using NumPy/SciPy
                            
                                Showing a gtk.Calendar in a menu?
                            
                                sqlalchemy raw sql query limit using connection.execute()
                            
                                f2py -- prevent array reordering
                            
                                PyQt4 @pyqtSlot: what is the result kwarg for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NLTK named entity recognition in dutch

Tags:

python

nlp

nltk

named-entity-recognition

user1491915

People also ask

1 Answers

Jacob

Recent Activity

Donate For Us