How to interpret Weka Logistic Regression output?

Tags:

weka

Please help interpret results of logistic regression produced by weka.classifiers.functions.Logistic from Weka library.

I use numeric data from Weka examples:

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

To create logistic regression model I use command: java -cp $WEKA_INS/weka.jar weka.classifiers.functions.Logistic -t $WEKA_INS/data/weather.numeric.arff -T $WEKA_INS/data/weather.numeric.arff -d ./weather.numeric.model.arff

Here the three arguments mean:

-t <name of training file> : Sets training file.
-T <name of test file> : Sets test file. 
-d <name of output file> : Sets model output file.

Running the above command produce the following output:

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508


Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===
Correctly Classified Instances          11               78.5714 %
Incorrectly Classified Instances         3               21.4286 %
Kappa statistic                          0.5532
Mean absolute error                      0.2066
Root mean squared error                  0.3273
Relative absolute error                 44.4963 %
Root relative squared error             68.2597 %
Total Number of Instances               14     

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

Questions:

1) First section of the report:

Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

1.1) Do I understand right that "Coefficients" are in fact weights that are applied to each attribute before adding them together to produce the value of class attribute "play" equal to " yes"?

2) Second section of the report:

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508

2.1) What is the meaning of "Odds Ratios"? 2.2) Do they all also relate to class attribute "play" equal to " yes"? 2.3) Why value of "outlook=overcast" is so much bigger then value of "outlook=sunny"?

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

3.1) What is the menaing of Confusion Matrix?

Thanks a lot for your help!

211

asked Oct 02 '13 11:10

Anton Ashanin

1 Answers

Question:

Updated from comment below: The coefficients are in fact the weights that are applied to each attribute which are plugged into the logistic function 1/(1+exp(-weighted_sum)) to obtain probabilities. Note that the "Intercept" value is added to the sum without multiplying by any of your variables before adding them together. The result is the probability that the new instance belongs to class yes (> 0.5 means yes).
The odds ratios indicate how large of an influence a change in that value (or change to that value) will have on the prediction. I think this link does a great job explaining the odds ratios. The value of outlook=overcast is so large because if the outlook is overcast the odds are very good that play will equal yes.
The confusion matrix simply shows you how many of the test data points are correctly and incorrectly classified. In your example 7 A's were actually classified as A, whereas 2 A's were misclassified as B. Your question is more thoroughly answered in this question: How to read the classifier confusion matrix in WEKA.

165

answered Oct 13 '22 10:10

Walter

Related questions
                            
                                Missing Values in Weka --
                            
                                Adding a new Instance in weka
                            
                                How to cluster an instance with Weka's DBSCAN?
                            
                                How to calculate the nearest neighbors using weka from the command line?
                            
                                ARFF for natural language processing
                            
                                How to get the nearest neighbor in weka using java
                            
                                Can TF/IDF take classes in account
                            
                                Which datamining tool to use? [closed]
                            
                                Why does the C4.5 algorithm use pruning in order to reduce the decision tree and how does pruning affect the predicion accuracy?
                            
                                K-means with really large matrix
                            
                                Disabling Eclipse code formatting for part of a javadoc
                            
                                Weka's PCA is taking too long to run
                            
                                What is Class Index in WEKA?
                            
                                Weka: Results of each fold in 10-fold CV
                            
                                Weka CSVLoader wrong number of values. Read 2, expected 23
                            
                                Weka only changing numeric to nominal
                            
                                How to add LibSVM class to WEKA classpath on a Mac
                            
                                .arff files with scikit-learn?
                            
                                Web/browser-oriented open source machine learning projects?
                            
                                Production architecture for big data real time machine learning application?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With