According to the documentation it is possible to specify different loss functions to SGDClassifier
. And as far as I understand log loss
is a cross-entropy
loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].
The question is: is it possible to use SGDClassifier
with log loss
function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?
UPDATE:
The way target
is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction
so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target
naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.
So the question is probably:
How to specify my own loss function for SGDClassifier
, for example. It seems scikit-learn
doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources
I recently had this problem and came up with a nice fix that seems to work.
Basically, transform your targets to log-odds-ratio space using the inverse sigmoid function. Then fit a linear regression. Then, to do inference, take the sigmoid of the predictions from the linear regression model.
So say we have soft targets/labels y ∈ (0, 1)
(make sure to clamp the targets to say [1e-8, 1 - 1e-8]
to avoid instability issues when we take logs).
We take the inverse sigmoid, then we fit a linear regression (assuming predictor variables are in matrix X
):
y = np.clip(y, 1e-8, 1 - 1e-8) # numerical stability
inv_sig_y = np.log(y / (1 - y)) # transform to log-odds-ratio space
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, inv_sig_y)
Then to make predictions:
def sigmoid(x):
ex = np.exp(x)
return ex / (1 + ex)
preds = sigmoid(lr.predict(X_new))
This seems to work, at least for my use case. My guess is that it's not far off what happens behind the scenes for LogisticRegression anyway.
Bonus: this also seems to work with other regression models in sklearn
, e.g. RandomForestRegressor
.
According to the docs,
The ‘log’ loss gives logistic regression, a probabilistic classifier.
In general a loss function is of the form Loss( prediction, target )
, where prediction
is the model's output, and target
is the ground-truth value. In the case of logistic regression, prediction
is a value on (0,1)
(i.e., a "soft label"), while target
is 0
or 1
(i.e., a "hard label").
So in answer to your question, it depends on if you are referring to the prediction
or target
. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for prediction
and by the data on hand for target
.
If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.
If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of (0,1)
, and this would have to be handled.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With