I have a pandas dataframe that has some NaN values in a particular column:
1291 NaN
1841 NaN
2049 NaN
Name: some column, dtype: float64
And I have made the following pipeline in order to deal with it:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
scaler = StandardScaler(with_mean = True)
imputer = SimpleImputer(strategy = 'median')
logistic = LogisticRegression()
pipe = Pipeline([('imputer', imputer),
('scaler', scaler),
('logistic', logistic)])
Now when I pass this pipeline to a RandomizedSearchCV, I get the following error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
It's actually quite a bit longer than that -- I can post the entire error in an edit if neccesary. Anyway, I am quite sure that this column is the only column that contains NaNs. Moreover, if I switch from SimpleImputer to the (now deprecated) Imputer in the pipeline, the pipeline works just fine in my RandomizedSearchCV. I checked the documentation, but it seems that SimpleImputer is supposed to behave in (nearly) the exact same way as Imputer. What is the difference in behavior? How do use an imputer in my pipeline without using the deprecated Imputer?
SimpleImputer in make_pipeline
preprocess_pipeline = make_pipeline(
FeatureUnion(transformer_list=[
('Handle numeric columns', make_pipeline(
ColumnSelector(columns=['Amount']),
SimpleImputer(strategy='constant', fill_value=0),
StandardScaler()
)),
('Handle categorical data', make_pipeline(
ColumnSelector(columns=['Type', 'Name', 'Changes']),
SimpleImputer(strategy='constant', missing_values=' ', fill_value='missing_value'),
OneHotEncoder(sparse=False)
))
])
)
SimpleImputer in Pipeline
('features', FeatureUnion ([
('Cat Columns', Pipeline([
('Category Extractor', TypeSelector(np.number)),
('Impute Zero', SimpleImputer(strategy="constant", fill_value=0))
])),
('Numerics', Pipeline([
('Numeric Extractor', TypeSelector("category")),
('Impute Missing', SimpleImputer(strategy="constant", fill_value='missing'))
]))
]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With