I want to do POS tagging using SVM with non-English corpus in Python. It looks like Python does not support tagging using SVM yet (http://www.nltk.org/_modules).
scikit-learn has a SVM module. So I installed scikit-learn and use it in Python but I cannot find any tutorials about POS tagging using SVM.
I really have no clue what to do, any help would be appreciated.
Does it have to be an SVM? NTLK has built-in tools to do POS tagging: Categorizing and Tagging Words
If you want to use a custom classifier, look here: http://www.nltk.org/api/nltk.classify.html, Ctrl+F "svm", NTLK provides a wrapper for scikit-learn algorithms called SklearnClassifier
. Then take a look here http://www.nltk.org/api/nltk.tag.html, Ctrl+F "classifier", there is a class nltk.tag.sequential.ClassifierBasedPOSTagger
which apparently can use wrapped up classifiers from sklearn.
I haven't tried this but it might work.
EDIT: It should work like this:
from nltk.classify import SklearnClassifier
from sklearn.svm import SVC
clf = SklearnClassifier(SVC(),sparse=False)
cpos = nltk.tag.sequential.ClassifierBasedPOSTagger(train=train_sents,classifier_builder
= lambda train_feats: clf.train(train_feats))
The only problem is that sklearn classifiers take numerical features only, so you need to convert yours somehow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With