Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : How to find Accuracy Result in SVM Text Classifier Algorithm for Multilabel Class

I have used following set of code: And I need to check accuracy of X_train and X_test

The following code works for me in my classification problem over multi-labeled class

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
                    "new york is great and so is london",
                    "i like london better than new york"])
y_train = [[0],[0],[0],[0]
            ,[0],[0],[1],[1]
            ,[1],[1],[1],[1]
            ,[2],[2]]
X_test = np.array(['nice day in nyc',
                   'the capital of great britain is london',
                   'i like london better than new york',
                   ])   
target_names = ['Class 1', 'Class 2','Class 3']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_df=1,max_df=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

OUTPUT

nice day in nyc => Class 1
the capital of great britain is london => Class 2
i like london better than new york => Class 3

I would like to check the accuracy between Training and Test Dataset. Score Function doesn't work for me, it shows an error stating that multilabel value can't accepted

>>> classifier.score(X_train, X_test)

NotImplementedError: score is not supported for multilabel classifiers

Kindly help me get accuracy results for training and test data and choose an algorithm for our classification case.

like image 801
user_az Avatar asked Oct 28 '13 07:10

user_az


1 Answers

If you want to get an accuracy score for your test set, you'll need to create an answer key, which you can call y_test. You can't know if your predictions are correct unless you know the correct answers.

Once you have an answer key, you can get the accuracy. The method you want is sklearn.metrics.accuracy_score.

I've written it out below:

from sklearn.metrics import accuracy_score

# ... everything else the same ...

# create an answer key
# I hope this is correct!
y_test = [[1], [2], [3]]

# same as yours...
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)

# get the accuracy
print accuracy_score(y_test, predicted)

Also, sklearn has several other metrics besides accuracy. See them here: sklearn.metrics

like image 69
mayhewsw Avatar answered Nov 15 '22 17:11

mayhewsw