Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to use MEGAM as an NLTK ClassifierBasedPOSTagger?

I am currently trying to build a general purpose (or as general as is practical) POS tagger with NLTK. I have dabbled with the brown and treebank corpora for training, but will probably be settling on the treebank corpus.

Learning as I go, I am finding the classifier POS taggers are the most accurate. The Maximum Entity classifier is meant to be the most accurate, but I find it uses so much memory (and processing time) that I have to significantly reduce the training dataset, so the end result is less accurate than using the default Naive Bayes classifier.

It has been suggested that I use MEGAM. NLTK has some support for MEGAM, but all the examples I have found are for general classifiers (eg. a text classifier that uses a vector of word features), rather than a more specific POS tagger. Without having to recreate my own POS feature extractor and compiler (ie. I prefer to use the one already in NLTK), how can I used the MEGAM MaxEnt classifier? Ie. how can I drop it in some existing MaxEnt code that is along the lines of:

maxent_tagger = ClassifierBasedPOSTagger(train=training_sentences,
                                        classifier_builder=MaxentClassifier.train )
like image 522
winwaed Avatar asked Dec 17 '10 02:12

winwaed


2 Answers

This one liner should work for training a MEGAM MaxentClassifier for the ClassifierBasedPOSTagger. Of course, that assumes MEGAM is already installed (go here to download)

maxent_tagger = ClassifierBasedPOSTagger(train=train_sents, classifier_builder=lambda train_feats: MaxentClassifier.train(train_feats, algorithm='megam', max_iter=10, min_lldelta=0.1))
like image 75
Jacob Avatar answered Dec 04 '22 13:12

Jacob


For the future users:

Megam is now available on MAC:

$brew tap homebrew/science
$brew install megam

If you dont have XQuartz, it might ask you to get that first. Here is the direct download link: http://xquartz.macosforge.org/downloads/SL/XQuartz-2.7.5_rc4.dmg

like image 42
pg2455 Avatar answered Dec 04 '22 11:12

pg2455