Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

eli5: show_weights() with two labels

I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes.

You can run this script:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups

#categories = ['alt.atheism', 'soc.religion.christian']
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

np.random.seed(1)
train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=7)
test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=7)

bow_model = CountVectorizer(stop_words='english')
clf = LogisticRegression()
pipel = Pipeline([('bow', bow),
                 ('classifier', clf)])

pipel.fit(train.data, train.target)

import eli5
eli5.show_weights(clf, vec=bow, top=20)

Problem:

When working with two labels, the output is unfortunately limited to only one table:

categories = ['alt.atheism', 'soc.religion.christian']

Image 1

However, when using three labels, it also outputs three tables.

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

enter image description here

Is it a bug in the software that it misses y=0 in the first output, or do I miss a statistical point? I would expect to see two tables for the first case.

like image 703
Christopher Avatar asked Aug 02 '18 17:08

Christopher


1 Answers

This has not to do with eli5 but with how scikit-learn (in this case LogisticRegression()) treats two categories. For only two categories, the problem turns into a binary one, so only a single column of attributes is returned everywhere from learned classifier.

Look at the attributes of LogisticRegression:

coef_ : array, shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.
coef_ is of shape (1, n_features) when the given problem is binary.

intercept_ : array, shape (1,) or (n_classes,)

Intercept (a.k.a. bias) added to the decision function.

If fit_intercept is set to False, the intercept is set to zero.
intercept_ is of shape(1,) when the problem is binary.

coef_ is of shape (1, n_features) when binary. This coef_ is used by the eli5.show_weights().

Hope this makes it clear.

like image 84
Vivek Kumar Avatar answered Dec 27 '22 02:12

Vivek Kumar