Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,)

I have create a simple code to implement OneHotEncoder.

from sklearn.preprocessing import OneHotEncoder
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
onehotencoder = OneHotEncoder(categories=[0])
X = onehotencoder.fit_transform(X).toarray()

I just want to use method called fit_transform to the X for index 0, so it means for [0, 0, 1, 2] like what you see in X. But it causes an error like this :

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Anyone can solve this problem ? I am stuck on it

like image 424
arga wirawan Avatar asked Dec 30 '19 05:12

arga wirawan


2 Answers

You need to use ColumnTransformer to specify the column index not categories parameter.

Constructor parameter categories is to tell distinct category values explicitly. E.g. you could provide [0, 1, 2] explicitly, but auto will determine it. Further, you can use slice() object instead.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]

ct = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],   # The column numbers to be transformed (here is [0] but can be [0, 1, 3])
    remainder='passthrough'                                         # Leave the rest of the columns untouched
)

X = ct.fit_transform(X)
like image 156
TRiNE Avatar answered Oct 20 '22 20:10

TRiNE


pandas.get_dummies() method also can do same in the way below:

import numpy as np
import pandas as pd
X = np.array([[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']])
X = np.array(pd.concat([pd.get_dummies(X[:, 0]), pd.DataFrame(X[:, 1])], axis = 1))
like image 34
shubh Avatar answered Oct 20 '22 21:10

shubh