Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SGDClassifier with class_weight=auto fails on scikit-learn 0.15 but not 0.14

Tags:

scikit-learn

When I train an scikit-learn v0.15 SGDClassifier with these options: SGDClassifier(loss='log', class_weight=None, penalty='l2'), training completes with no error. Yet when I train this classifier with class_weight='auto' on scikit-learn v0.15, I get this error:

  return self.model.fit(X, y)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 485, in fit
    sample_weight=sample_weight)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 389, in _fit
    classes, sample_weight, coef_init, intercept_init)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 336, in _partial_fit
    y_ind)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/utils/class_weight.py", line 43, in compute_class_weight
    raise ValueError("classes should have valid labels that are in y")
ValueError: classes should have valid labels that are in y

What could cause it?

For reference, here's the documentation on class_weight:

Preset for the class_weight fit parameter. Weights associated with classes. If not given, all classes are supposed to have weight one. The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.

like image 772
Rose Perrone Avatar asked Jul 17 '14 16:07

Rose Perrone


People also ask

What is SGDClassifier Sklearn?

This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

What is SGDClassifier?

SGD Classifier is a linear classifier (SVM, logistic regression, a.o.) optimized by the SGD. These are two different concepts. While SGD is a optimization method, Logistic Regression or linear Support Vector Machine is a machine learning algorithm/model.

How do you give class weights in Sklearn?

If 'balanced', class weights will be given by n_samples / (n_classes * np. bincount(y)) . If a dictionary is given, keys are classes and values are corresponding class weights. If None is given, the class weights will be uniform.


1 Answers

I think this may be a bug within scikit-learn. As a work around, try the following:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_encoded = le.fit_transform(y)
self.model.fit(X, y_encoded)
pred = le.inverse_transform(self.model.predict(X))
like image 134
Danny Sullivan Avatar answered Nov 16 '22 08:11

Danny Sullivan