I have used following set of code: And I need to check accuracy of X_train and X_test
The following code works for me in my classification problem over multi-labeled class
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train = [[0],[0],[0],[0]
,[0],[0],[1],[1]
,[1],[1],[1],[1]
,[2],[2]]
X_test = np.array(['nice day in nyc',
'the capital of great britain is london',
'i like london better than new york',
])
target_names = ['Class 1', 'Class 2','Class 3']
classifier = Pipeline([
('vectorizer', CountVectorizer(min_df=1,max_df=2)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))
OUTPUT
nice day in nyc => Class 1
the capital of great britain is london => Class 2
i like london better than new york => Class 3
I would like to check the accuracy between Training and Test Dataset. Score Function doesn't work for me, it shows an error stating that multilabel value can't accepted
>>> classifier.score(X_train, X_test)
NotImplementedError: score is not supported for multilabel classifiers
Kindly help me get accuracy results for training and test data and choose an algorithm for our classification case.
If you want to get an accuracy score for your test set, you'll need to create an answer key, which you can call y_test
. You can't know if your predictions are correct unless you know the correct answers.
Once you have an answer key, you can get the accuracy. The method you want is sklearn.metrics.accuracy_score.
I've written it out below:
from sklearn.metrics import accuracy_score
# ... everything else the same ...
# create an answer key
# I hope this is correct!
y_test = [[1], [2], [3]]
# same as yours...
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
# get the accuracy
print accuracy_score(y_test, predicted)
Also, sklearn has several other metrics besides accuracy. See them here: sklearn.metrics
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With