Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicting missing values with scikit-learn's Imputer module

I am writing a very basic program to predict missing values in a dataset using scikit-learn's Imputer class.

I have made a NumPy array, created an Imputer object with strategy='mean' and performed fit_transform() on the NumPy array.

When I print the array after performing fit_transform(), the 'Nan's remain, and I dont get any prediction.

What am I doing wrong here? How do I go about predicting the missing values?

import numpy as np
from sklearn.preprocessing import Imputer

X = np.array([[23.56],[53.45],['NaN'],[44.44],[77.78],['NaN'],[234.44],[11.33],[79.87]])

print X

imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit_transform(X)

print X
like image 740
xennygrimmato Avatar asked Jul 29 '14 14:07

xennygrimmato


People also ask

Can Sklearn handle missing values?

The scikit-learn library provides two mechanisms to deal with missing values: Univariate Feature Imputation. Multivariate Feature Imputation. Nearest neighbors imputation.

What is Imputer in Sklearn?

The imputation strategy. If “mean”, then replace missing values using the mean along the axis. If “median”, then replace missing values using the median along the axis. If “most_frequent”, then replace missing using the most frequent value along the axis.

How does Sklearn Knn Imputer work?

Imputation for completing missing values using k-Nearest Neighbors. Each sample's missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set. Two samples are close if the features that neither is missing are close.

Which Sklearn function can be used for imputing missing data?

We can use SimpleImputer function from scikit-learn to replace missing values with a fill value. SimpleImputer function has a parameter called strategy that gives us four possibilities to choose the imputation method: strategy='mean' replaces missing values using the mean of the column.


1 Answers

Per the documentation, sklearn.preprocessing.Imputer.fit_transform returns a new array, it doesn't alter the argument array. The minimal fix is therefore:

X = imp.fit_transform(X)
like image 132
jonrsharpe Avatar answered Oct 02 '22 16:10

jonrsharpe