How can I use one-hot encoded labels with some sklearn classifiers?

Question

I have a multiclass classification task with 10 classes. As such, I used sklearn's OneHotEncoder to transform the one-column labels to 10-columns labels. I was trying to fit the training data. Although I was able to do this with RandomForestClassifier, I got the below error message when fitting with GaussianNB:

ValueError: bad input shape (1203L, 10L)

I understand the allowed shape of y in these two classifiers is different:

GaussianNB:

y : array-like, shape (n_samples,)

RandomForest:

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The question is, why is this? Wouldn't this be contradictory to "All classifiers in scikit-learn do multiclass classification out-of-the-box"? Any way to go around it? Thanks!

lejlot · Accepted Answer

The question is, why is this?

It is because of a slight missunderstanding, in scikit-learn you do not encode labels, you pass it as one dimensional vector of labels, thus instead of

1 0 0
0 1 0
0 0 1

you literally pass

1 2 3

So why does random forest accepts a different scheme? Because it is not for multiclass setting! It is for multi label where each instance can have many labels, like

1 1 0
1 1 1
0 0 0

Wouldn't this be contradictory to "All classifiers in scikit-learn do multiclass classification out-of-the-box"?

Contrary - it is the easiest solution - to never ask for one-hot unless it is multi-label,

Any way to go around it?

Yup, just do not encode - pass raw labels :-)

How can I use one-hot encoded labels with some sklearn classifiers?

Tags:

machine-learning

scikit-learn

George Liu

1 Answers

lejlot

Recent Activity

Donate For Us

How can I use one-hot encoded labels with some sklearn classifiers?

Tags:

machine-learning

scikit-learn

George Liu

1 Answers

lejlot

Related questions

Recent Activity

Donate For Us