Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SciKitlearn ColumnTransformer TypeError: Cannot clone object. You should provide an instance of scikit-learn estimator instead of a class

Attempting to work with something that looks a little like this:

  CATEGORY | NUMBER VALUE  | ID

   FRUIT   |      15       |  XCD

  VEGGIES  |      12       |  ZYK



from sklearn.preprocessing import LabelEncoder, OneHotEncoder
data = data.iloc[:,:].values
enc = LabelEncoder()
data[:,0]=enc .fit_transform(data[:,0])
data

Output:

array([[1, 15, 'XCD'],
       [2, 12, 'ZYK']])

Then...

from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(transformers=[('encode',OneHotEncoder,[0])],remainder='passthrough')
dataset = np.array(ct.fit_transform(data))

gives

TypeError: Cannot clone object. You should provide an instance of scikit-learn estimator instead of a class.
like image 985
Kyle Papili Avatar asked Jun 11 '20 17:06

Kyle Papili


2 Answers

I believe I resolved this one. The TypeError is pretty self explanatory and I'm ashamed for not recognizing this before posting my question. Essentially I just needed to create an instance of the OneHotEncoder class. Adding one line as shown in the code below resolved my situation. Thank you!

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
oHe = OneHotEncoder()
ct = ColumnTransformer(transformers=[('encode',oHe,[0])],remainder='passthrough')
dataset = np.array(ct.fit_transform(data))
like image 149
Kyle Papili Avatar answered Oct 19 '22 14:10

Kyle Papili


I had faced similar issue when fitting RandomizedSearchCV in xgboost. Just like said above, I also felt ashamed for not identifying this simple error. I typed

regressor = xgboost.XGBRegressor 

instead of

regressor = xgboost.XGBRegressor(). 

After reading here, I spent sometime to identify this error and it worked fine.

like image 3
Jay Avatar answered Oct 19 '22 13:10

Jay