Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Simple Imputer with Pandas dataframe?

I am working with the following dataset:

https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

which it says that has some missing values marked with "?". I want to apply the SimpleImputer library and my code is the following:

file="breast_cancer"
df=pd.read_csv(file,names=['id', 'clump_thickness','unif_cell_size',
                                                         'unif_cell_shape', 'marg_adhesion', 'single_epith_cell_size',
                                                         'bare_nuclei', 'bland_chromatin', 'normal_nucleoli','mitoses','class'])
df.replace('?',np.NaN,inplace=True)
imp=SimpleImputer(missing_values="NaN")
idf=pd.DataFrame(imp.fit_transform(df))
idf.columns=df.columns
idf.index=df.index

so I want to replace all the ? values from all the columns with the mean, and return a new dataframe. The problem is that I got the following error:

Input contains NaN, infinity or a value too large for dtype('float64').

What am I missing?

like image 769
Little Avatar asked Oct 18 '25 07:10

Little


1 Answers

You are trying to impute a "NaN", a str, whereas you've replaced ? with np.NaN.

Instantiate SimpleImputer with np.nan and works fine:

df.replace('?',np.NaN,inplace=True)
imp=SimpleImputer(missing_values=np.NaN)
idf=pd.DataFrame(imp.fit_transform(df))
idf.columns=df.columns
idf.index=df.index

idf['bare_nuclei'].isna().sum()

Output:

0
# No nan : Imputing successful
like image 85
Chris Avatar answered Oct 19 '25 20:10

Chris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!