Scikit: Problem returning Dataframe from imputer instead of Numpy Array

Question

I am trying to impute some missing values in a Dataframe using the scikit-learn IterativeImputer(). The problem is that the imputer will take the pandas dataframe as an input, but will return a numpy array instead of the original dataframe. Here is a simple example taken from this post.

# Create an empty dataset
df = pd.DataFrame()

# Create two variables called x0 and x1. Make the first value of x1 a missing value
df['x0'] = [0.3051,0.4949,0.6974,0.3769,0.2231,0.341,0.4436,0.5897,0.6308,0.5]
df['x1'] = [np.nan,0.2654,0.2615,0.5846,0.4615,0.8308,0.4962,0.3269,0.5346,0.6731]

imputer = IterativeImputer(max_iter=10, random_state=42)
imputer.fit(df)
imputed_df = imputer.transform(df)
imputed_df

The problem is that when the numpy array is returned, the column names are removed and other metadata. I can of course manually extract that metadata from the original dataframe and then reapply it, but that seems a bit hacky. Pandas has its own imputer in terms of Dataframe.fillna() but the algorithms are not as sophisticated as the scikit ones.

So is there a way to fit the imputer to a dataframe and return a dataframe from the result.

BENY · Accepted Answer

Yes you can , just assign the values back

df[:]= imputer.transform(df)

Scikit: Problem returning Dataframe from imputer instead of Numpy Array

Tags:

python

pandas

dataframe

numpy

scikit-learn

krishnab

1 Answers

BENY

Recent Activity

Donate For Us

Scikit: Problem returning Dataframe from imputer instead of Numpy Array

Tags:

python

pandas

dataframe

numpy

scikit-learn

krishnab

1 Answers

BENY

Related questions

Recent Activity

Donate For Us