I have a pandas dataframe that has some NaN values in a particular column:
1291 NaN
1841 NaN
2049 NaN
Name: some column, dtype: float64
And I have made the following pipeline in order to deal with it:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
scaler = StandardScaler(with_mean = True)
imputer = SimpleImputer(strategy = 'median')
logistic = LogisticRegression()
pipe = Pipeline([('imputer', imputer),
('scaler', scaler),
('logistic', logistic)])
Now when I pass this pipeline to a RandomizedSearchCV
, I get the following error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
It's actually quite a bit longer than that -- I can post the entire error in an edit if neccesary. Anyway, I am quite sure that this column is the only column that contains NaNs. Moreover, if I switch from SimpleImputer
to the (now deprecated) Imputer
in the pipeline, the pipeline works just fine in my RandomizedSearchCV
. I checked the documentation, but it seems that SimpleImputer
is supposed to behave in (nearly) the exact same way as Imputer
. What is the difference in behavior? How do use an imputer in my pipeline without using the deprecated Imputer
?
SimpleImputer in make_pipeline
preprocess_pipeline = make_pipeline(
FeatureUnion(transformer_list=[
('Handle numeric columns', make_pipeline(
ColumnSelector(columns=['Amount']),
SimpleImputer(strategy='constant', fill_value=0),
StandardScaler()
)),
('Handle categorical data', make_pipeline(
ColumnSelector(columns=['Type', 'Name', 'Changes']),
SimpleImputer(strategy='constant', missing_values=' ', fill_value='missing_value'),
OneHotEncoder(sparse=False)
))
])
)
SimpleImputer in Pipeline
('features', FeatureUnion ([
('Cat Columns', Pipeline([
('Category Extractor', TypeSelector(np.number)),
('Impute Zero', SimpleImputer(strategy="constant", fill_value=0))
])),
('Numerics', Pipeline([
('Numeric Extractor', TypeSelector("category")),
('Impute Missing', SimpleImputer(strategy="constant", fill_value='missing'))
]))
]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With