Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Facing ValueError: Target is multiclass but average='binary'

I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset.

I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

Can anyone please suggest me how to proceed with it. I have tried using average ='micro' in the precision and the recall scores.It worked without any errors but it is giving the same score for accuracy, precision, recall.

My dataset:

train_data.csv:

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

test_data.csv:

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

My Code:

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score


def load_data(filename):
    reviews = list()
    labels = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            labels.append(line[1])
            reviews.append(line[0])

    return reviews, labels

X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')

vec = CountVectorizer() 

X_train_transformed =  vec.fit_transform(X_train) 

X_test_transformed = vec.transform(X_test)

clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)

y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))

print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )
like image 257
Intrigue777 Avatar asked Sep 11 '18 05:09

Intrigue777


1 Answers

You need to add the 'average' param. According to the documentation:

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

Do this:

print("Precision Score : ",precision_score(y_test, y_pred, 
                                           pos_label='positive'
                                           average='micro'))
print("Recall Score : ",recall_score(y_test, y_pred, 
                                           pos_label='positive'
                                           average='micro'))

Replace 'micro' with any one of the above options except 'binary'. Also, in the multiclass setting, there is no need to provide the 'pos_label' as it will be anyways ignored.

Update for comment:

Yes, they can be equal. Its given in the user guide here:

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.

like image 157
Vivek Kumar Avatar answered Nov 18 '22 16:11

Vivek Kumar