Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Imputer reduces the size of columns in my dataframe

print(np.shape(ar_fulldata_input_xx))

Output: (9027, 1443)

Now I use Imputer to impute the missing values of my dataframe ar_fulldata_input_xx as follows.

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=0)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(ar_fulldata_input_xx))

Now I check the size of my imputed dataframe as follows.

print(np.shape(imputed_DF))

Output: (9027, 1442)

Why is the column size reduced by one?

Is there any way I can find which column is mixing after impute function??

I have run the following line of code to remove the all columns with entire "NAN" values or entire "0" values.

ar_fulldata_input_xx = ar_fulldata_input_xx.loc[:, (ar_fulldata_input_xx != 0).any(axis=0)]

and

ar_fulldata_input_xx=ar_fulldata_input_xx.dropna(axis=1, how='all')
like image 334
Abdul Karim Khan Avatar asked Feb 19 '18 02:02

Abdul Karim Khan


People also ask

What does Imputer do in python?

The imputer is an estimator used to fill the missing values in datasets. For numerical values, it uses mean, median, and constant. For categorical values, it uses the most frequently used and constant value. You can also train your model to predict the missing labels.

What is Sklearn impute?

The imputation strategy. If “mean”, then replace missing values using the mean along the axis. If “median”, then replace missing values using the median along the axis. If “most_frequent”, then replace missing using the most frequent value along the axis.


1 Answers

You can do it on pandas using this:

ndf = df.fillna(df.mean())

It seems that there was an issue with one of the columns that was not importing properly the numeric values from the original file, so it is likely that this was the reason that the Imputer didn't work. OP is taking a look at it.

like image 128
joaoavf Avatar answered Oct 10 '22 23:10

joaoavf