Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set a value for a specific threshold in SVC model and generate a confusion matrix?

I need to set a value to a specific threshold and generate a confusion matrix. The data is in a csv file (11,1 MB), this link for download is: https://drive.google.com/file/d/1cQFp7HteaaL37CefsbMNuHqPzkINCVzs/view?usp=sharing?

First, i received a error message: ""AttributeError: predict_proba is not available when probability=False"" So i used this for correction:

svc = SVC(C=1e9,gamma= 1e-07)
scv_calibrated = CalibratedClassifierCV(svc)
svc_model = scv_calibrated.fit(X_train, y_train) 

I saw a lot on the internet and I didn't quite understand how a specific threshold value is being persolanized. Sounds pretty hard. Now, i see a wrong output:

array([[   0,    0],
       [5359,   65]])

I have no idea whats is somenthing wrong.

i need help and i'm new in that. thanks

from sklearn.model_selection import train_test_split

df = pd.read_csv('fraud_data.csv')

X = df.iloc[:,:-1]
y = df.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)



def answer_four():
    from sklearn.metrics import confusion_matrix
    from sklearn.svm import SVC
    from sklearn.calibration import CalibratedClassifierCV
    from sklearn.model_selection import train_test_split


    svc = SVC(C=1e9,gamma= 1e-07)
    scv_calibrated = CalibratedClassifierCV(svc)
    svc_model = scv_calibrated.fit(X_train, y_train)

    # set threshold as -220
    y_pred = (svc_model.predict_proba(X_test)[:,1] >= -220) 

    conf_matrix = confusion_matrix(y_pred, svc_model.predict(X_test))

    return conf_matrix
answer_four()

This function should return a confusion matrix, a 2x2 numpy array with 4 integers.

like image 302
Gizelly Avatar asked Oct 24 '19 16:10

Gizelly


2 Answers

This code produces the expected output, in addition to the fact that in the previous code I was using the confusion matrix incorrectly I should have also used decision_function and getting the output filtering the 220 threshold.

def answer_four():
    from sklearn.metrics import confusion_matrix
    from sklearn.svm import SVC
    from sklearn.calibration import CalibratedClassifierCV
    from sklearn.model_selection import train_test_split

    #SVC without mencions of kernel, the default is rbf
    svc = SVC(C=1e9, gamma=1e-07).fit(X_train, y_train)

    #decision_function scores: Predict confidence scores for samples
    y_score = svc.decision_function(X_test)

    #Set a threshold -220
    y_score = np.where(y_score > -220, 1, 0)
    conf_matrix = confusion_matrix(y_test, y_score)

####threshold###
#input threshold in the model after trained this model
#threshold is a limiar of separation of class   

return conf_matrix

answer_four()
#output: 
array([[5320,   24],
       [  14,   66]])
like image 200
Gizelly Avatar answered Sep 30 '22 19:09

Gizelly


You are using the confusion matrix in a wrong way.

The idea behind the confusion matrix is to have a picture as to how good our predictions y_pred are compared with the ground truth y_true, usually in a test set.

What you actually do here is computing a "confusion matrix" between your predictions with the custom threshold of -220 (y_pred), compared to some other predictions with the default threshold (the output of svc_model.predict(X_test)), which does not make any sense.

Your ground truth for the test set is y_test; so, to get the confusion matrix with the default threshold, you should use

confusion_matrix(y_test, svc_model.predict(X_test))

To get the confusion matrix with your custom threshold of -220, you should use

confusion_matrix(y_test, y_pred)

See the documentation for more details in the usage (which is your best friend, and should always be the first place to look at, when having issues or doubts).

like image 21
desertnaut Avatar answered Sep 30 '22 19:09

desertnaut