Role of class_weight in loss functions for linearSVC and LogisticRegression

I am trying to figure out what exactly the loss function formula is and how I can manually calculate it when class_weight='auto' in case of svm.svc, svm.linearSVC and linear_model.LogisticRegression.

For balanced data, say you have a trained classifier: clf_c. Logistic loss should be (am I correct?):

def logistic_loss(x,y,w,b,b0):
    '''
    x: nxp data matrix where n is number of data points and p is number of features.
    y: nx1 vector of true labels (-1 or 1).
    w: nx1 vector of weights (vector of 1./n for balanced data).
    b: px1 vector of feature weights.
    b0: intercept.
    '''
    s = y
    if 0 in np.unique(y):
        print 'yes'
        s = 2. * y - 1
    l = np.dot(w, np.log(1 + np.exp(-s * (np.dot(x, np.squeeze(b)) + b0))))
    return l

I realized that logisticRegression has predict_log_proba() which gives you exactly that when data is balanced:

b, b0 = clf_c.coef_, clf_c.intercept_
w = np.ones(len(y))/len(y)
-(clf_c.predict_log_proba(x[xrange(len(x)), np.floor((y+1)/2).astype(np.int8)]).mean() == logistic_loss(x,y,w,b,b0)

Note, np.floor((y+1)/2).astype(np.int8) simply maps y=(-1,1) to y=(0,1).

But this does not work when data is imbalanced.

What's more, you expect the classifier (here, logisticRegression) to perform similarly (in terms of loss function value) when data in balance and class_weight=None versus when data is imbalanced and class_weight='auto'. I need to have a way to calculate the loss function (without the regularization term) for both scenarios and compare them.

In short, what does class_weight = 'auto' exactly mean? Does it mean class_weight = {-1 : (y==1).sum()/(y==-1).sum() , 1 : 1.} or rather class_weight = {-1 : 1./(y==-1).sum() , 1 : 1./(y==1).sum()}?

Any help is much much appreciated. I tried going through the source code, but I am not a programmer and I am stuck. Thanks a lot in advance.

What does Class_weight balanced do?

Balanced class weights can be automatically calculated within the sample weight function. Set class_weight = 'balanced' to automatically adjust weights inversely proportional to class frequencies in the input data (as shown in the above table).

What is Class_weight in logistic regression?

The LogisticRegression class provides the class_weight argument that can be specified as a model hyperparameter. The class_weight is a dictionary that defines each class label (e.g. 0 and 1) and the weighting to apply in the calculation of the negative log likelihood when fitting the model.

What is Class_weight in SVC?

With class_weight={0: 1, 1: 2} corresponding to the number of data points in each class.

What is the difference between SVC and Linearsvc?

The main difference between them is linearsvc lets your choose only linear classifier whereas svc let yo choose from a variety of non-linear classifiers. however it is not recommended to use svc for non-linear problems as they are super slow.

`class_weight` heuristics

I am a bit puzzled by your first proposition for the class_weight='auto' heuristic, as:

class_weight = {-1 : (y == 1).sum() / (y == -1).sum(), 
                1 : 1.}

is the same as your second proposition if we normalize it so that the weights sum to one.

Anyway to understand what class_weight="auto" does, see this question: what is the difference between class weight = none and auto in svm scikit learn.

I am copying it here for later comparison:

This means that each class you have (in classes) gets a weight equal to 1 divided by the number of times that class appears in your data (y), so classes that appear more often will get lower weights. This is then further divided by the mean of all the inverse class frequencies.

Note how this is not completely obvious ;).

This heuristic is deprecated and will be removed in 0.18. It will be replaced by another heuristic, class_weight='balanced'.

The 'balanced' heuristic weighs classes proportionally to the inverse of their frequency.

From the docs:

The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data: n_samples / (n_classes * np.bincount(y)).

np.bincount(y) is an array with the element i being the count of class i samples.

Here's a bit of code to compare the two:

import numpy as np
from sklearn.datasets import make_classification
from sklearn.utils import compute_class_weight

n_classes = 3
n_samples = 1000

X, y = make_classification(n_samples=n_samples, n_features=20, n_informative=10, 
    n_classes=n_classes, weights=[0.05, 0.4, 0.55])

print("Count of samples per class: ", np.bincount(y))
balanced_weights = n_samples /(n_classes * np.bincount(y))
# Equivalent to the following, using version 0.17+:
# compute_class_weight("balanced", [0, 1, 2], y)

print("Balanced weights: ", balanced_weights)
print("'auto' weights: ", compute_class_weight("auto", [0, 1, 2], y))

Output:

Count of samples per class:  [ 57 396 547]
Balanced weights:  [ 5.84795322  0.84175084  0.60938452]
'auto' weights:  [ 2.40356854  0.3459682   0.25046327]

The loss functions

Now the real question is: how are these weights used to train the classifier?

I don't have a thorough answer here unfortunately.

For SVC and linearSVC the docstring is pretty clear

Set the parameter C of class i to class_weight[i]*C for SVC.

So high weights mean less regularization for the class and a higher incentive for the svm to classify it properly.

I do not know how they work with logistic regression. I'll try to look into it but most of the code is in liblinear or libsvm and I'm not too familiar with those.

However, note that the weights in class_weight do not influence directly methods such as predict_proba. They change its ouput because the classifier optimizes a different loss function.
Not sure this is clear, so here's a snippet to explain what I mean (you need to run the first one for the imports and variable definition):

lr = LogisticRegression(class_weight="auto")
lr.fit(X, y)
# We get some probabilities...
print(lr.predict_proba(X))

new_lr = LogisticRegression(class_weight={0: 100, 1: 1, 2: 1})
new_lr.fit(X, y)
# We get different probabilities...
print(new_lr.predict_proba(X))

# Let's cheat a bit and hand-modify our new classifier.
new_lr.intercept_ = lr.intercept_.copy()
new_lr.coef_ = lr.coef_.copy()

# Now we get the SAME probabilities.
np.testing.assert_array_equal(new_lr.predict_proba(X), lr.predict_proba(X))

Hope this helps.

Role of class_weight in loss functions for linearSVC and LogisticRegression

Tags:

svm

scikit-learn

logistic-regression

JRun

People also ask

1 Answers

`class_weight` heuristics

The loss functions

ldirer

Recent Activity

Donate For Us

Role of class_weight in loss functions for linearSVC and LogisticRegression

Tags:

svm

scikit-learn

logistic-regression

JRun

People also ask

1 Answers

class_weight heuristics

The loss functions

ldirer

Related questions

Recent Activity

Donate For Us

`class_weight` heuristics