Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python imblearn make_pipeline TypeError: Last step of Pipeline should implement fit

I am trying to implement SMOTE of imblearn inside the Pipeline. My data sets are text data stored in pandas dataframe. Please see below the code snippet

text_clf =Pipeline([('vect', TfidfVectorizer()),('scale', StandardScaler(with_mean=False)),('smt', SMOTE(random_state=5)),('clf', LinearSVC(class_weight='balanced'))])

After this I am using GridsearchCV.

grid = GridSearchCV(text_clf, parameters, cv=4, n_jobs=-1, scoring = 'accuracy') 

Where parameters are nothing but tuning parameters mostly for TfidfVectorizer(). I am getting the following error.

 All intermediate steps should be transformers and implement fit and transform. 'SMOTE

Post this error, I have changed the code to as follows.

vect = TfidfVectorizer(use_idf=True,smooth_idf = True, max_df = 0.25, sublinear_tf = True, ngram_range=(1,2))
X = vect.fit_transform(X).todense()
Y = vect.fit_transform(Y).todense()
X_Train,X_Test,Y_Train,y_test = train_test_split(X,Y, random_state=0, test_size=0.33, shuffle=True)
text_clf =make_pipeline([('smt', SMOTE(random_state=5)),('scale', StandardScaler(with_mean=False)),('clf', LinearSVC(class_weight='balanced'))])
grid = GridSearchCV(text_clf, parameters, cv=4, n_jobs=-1, scoring = 'accuracy')

Where parameters are nothing but tuning Cin SVC classifiers. This time I am getting the following error:

Last step of Pipeline should implement fit.SMOTE(....) doesn't

What is going here? Can anyone please help?

like image 819
pythondumb Avatar asked Nov 02 '18 07:11

pythondumb


1 Answers

imblearn.SMOTE has no transform method. Docs is here.

But all steps except the last in a pipeline should have it, along with fit.

To use SMOTE with sklearn pipeline you should implement a custom transformer calling SMOTE.fit_sample() in transform method.

Another easier option is just to use ibmlearn pipeline:

from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as imbPipeline

# This doesn't work with sklearn.pipeline.Pipeline because
# SMOTE doesn't have a .tranform() method.
# (It has .fit_sample() or .sample().)
pipe = imbPipeline([
    ... 
    ('oversample', SMOTE(random_state=5)),
    ('clf', LinearSVC(class_weight='balanced'))
])
like image 86
x3mka Avatar answered Nov 17 '22 07:11

x3mka