Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpret predicted probabilities in multiclass logistic regression

I have a dataset as given below where A,B,C,D,E are features and 'T' is Target Variable.

A     B    C     D     E       T
32    22   55    76    98      3
12    41   90    56    33      2
31    78   99    67    89      1
51    85   71    21    37      1
......
......

Now, I have applied multiclass logistic regression classifier using Scikit Learn and get predict values and matrix of probablities as:-

 A     B    C     D     E       T   Predicted    Probablity
32    22   55    76    98       3     3           0.35
12    41   90    56    33       2     1           0.68
31    78   99    67    89       1     3           0.31
51    85   71    21    37       1     1           0.25

Now just want to ask how to I interpret the outcome probablities, 1) As far I have studied that python by default gives the probablity of event to be 1. So if this is the case, is 0.35 considered to be probablity of being event 1? OR 2) is value 0.35 is possibility of 1st case to be belongs from class "3"? How could I calculate probablities for remaining two classes. Something like:-

 A     B    C     D     E       T   Predicted     P_1    P_2    P_3
32    22   55    76    98       3     3           0.35   0.20   0.45
12    41   90    56    33       2     1           0.68   0.10   0.22
31    78   99    67    89       1     3           0.31   0.40   0.29
51    85   71    21    37       1     1           0.25   0.36   0.39
like image 541
James Avatar asked Nov 28 '25 01:11

James


1 Answers

from sklearn.linear_classifier import LogisticRegression

lr = LogisticRegression(random_state = 1)
lr.fit(x_train,y_train)

We fit our training data.

lr.predict_proba(x_test)

Suppose the dataset contains three classes.The output will be something like:

array([[  2.69011925e-02,   5.40807755e-01,   4.32291053e-01],
   [  9.32525056e-01,   6.73606657e-02,   1.14278375e-04],
   [  5.24023874e-04,   3.24718067e-01,   6.74757909e-01],
   [  4.75066650e-02,   5.86482429e-01,   3.66010906e-01],
   [  1.83396339e-02,   4.77753541e-01,   5.03906825e-01],
   [  8.82971089e-01,   1.16720108e-01,   3.08803089e-04],
   [  4.64149328e-02,   7.17011933e-01,   2.36573134e-01],
   [  1.65574625e-02,   3.29502329e-01,   6.53940209e-01],
   [  8.70375470e-01,   1.29512862e-01,   1.11667567e-04],
   [  8.51328361e-01,   1.48584654e-01,   8.69851797e-05]])

In given output array, each row has 3 columns, showing respective probability for each class. Each row represents a sample.

lr.predict_proba(x_test[0,:]) **OR** lr.predict_proba(x_test)[0,:]

Output:

array([ 0.02690119,  0.54080775,  0.43229105])

i.e probability for that sample.

like image 106
Faraz Gerrard Jamal Avatar answered Nov 29 '25 15:11

Faraz Gerrard Jamal