Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do POS tagging using SVM in Python?

I want to do POS tagging using SVM with non-English corpus in Python. It looks like Python does not support tagging using SVM yet (http://www.nltk.org/_modules).

scikit-learn has a SVM module. So I installed scikit-learn and use it in Python but I cannot find any tutorials about POS tagging using SVM.

I really have no clue what to do, any help would be appreciated.

like image 437
Sam Black Avatar asked Sep 26 '22 20:09

Sam Black


1 Answers

Does it have to be an SVM? NTLK has built-in tools to do POS tagging: Categorizing and Tagging Words

If you want to use a custom classifier, look here: http://www.nltk.org/api/nltk.classify.html, Ctrl+F "svm", NTLK provides a wrapper for scikit-learn algorithms called SklearnClassifier. Then take a look here http://www.nltk.org/api/nltk.tag.html, Ctrl+F "classifier", there is a class nltk.tag.sequential.ClassifierBasedPOSTaggerwhich apparently can use wrapped up classifiers from sklearn.

I haven't tried this but it might work.

EDIT: It should work like this:

from nltk.classify import SklearnClassifier
from sklearn.svm import SVC
clf = SklearnClassifier(SVC(),sparse=False)
cpos = nltk.tag.sequential.ClassifierBasedPOSTagger(train=train_sents,classifier_builder
= lambda train_feats: clf.train(train_feats))

The only problem is that sklearn classifiers take numerical features only, so you need to convert yours somehow.

like image 106
hellpanderr Avatar answered Sep 30 '22 08:09

hellpanderr