Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SGD model "overconfidence"

I'm working on binary classification problem using Apache Mahout. The algorithm I use is OnlineLogisticRegression and the model which I currently have strongly tends to produce predictions which are either 1 or 0 without any middle values.

Please suggest a way to tune or tweak the algorithm to make it produce more intermediate values in predictions.

Thanks in advance!

like image 635
Alexander Oleynikov Avatar asked May 17 '26 21:05

Alexander Oleynikov


2 Answers

What is the test error rate of the classifier? If it's near zero then being confident is a feature, not a bug.

If the test error rate is high (or at least not low), then the classifier might be overfitting the training set: measure the difference between of the training error and the test error. In that case, increasing regularization as rrenaud suggested might help.

If your classifier is not overfitting, then there might be an issue with the probability calibration. Logistic Regression models (e.g. using the logit link function) should yield good enough probability calibrations (if the problem is approximately linearly separable and the label not too noisy). You can check the calibration of the probabilities with a plot as explained in this paper. If this is really a calibration issue, then implementing a custom calibration based on Platt scaling or isotonic regression might help fix the issue.

like image 77
ogrisel Avatar answered May 20 '26 04:05

ogrisel


From reading the Mahout AbstractOnlineLogisticRegression docs, it looks like you can control the regularization parameter lambda. Increasing lambda should mean your weights are closer to 0, and hence your predictions are more hedged.

like image 44
Rob Neuhaus Avatar answered May 20 '26 05:05

Rob Neuhaus