print(np.shape(ar_fulldata_input_xx))
Output: (9027, 1443)
Now I use Imputer
to impute the missing values of my dataframe ar_fulldata_input_xx
as follows.
fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=0)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(ar_fulldata_input_xx))
Now I check the size of my imputed dataframe as follows.
print(np.shape(imputed_DF))
Output: (9027, 1442)
Why is the column size reduced by one?
Is there any way I can find which column is mixing after impute function??
I have run the following line of code to remove the all columns with entire "NAN" values or entire "0" values.
ar_fulldata_input_xx = ar_fulldata_input_xx.loc[:, (ar_fulldata_input_xx != 0).any(axis=0)]
and
ar_fulldata_input_xx=ar_fulldata_input_xx.dropna(axis=1, how='all')
The imputer is an estimator used to fill the missing values in datasets. For numerical values, it uses mean, median, and constant. For categorical values, it uses the most frequently used and constant value. You can also train your model to predict the missing labels.
The imputation strategy. If “mean”, then replace missing values using the mean along the axis. If “median”, then replace missing values using the median along the axis. If “most_frequent”, then replace missing using the most frequent value along the axis.
You can do it on pandas using this:
ndf = df.fillna(df.mean())
It seems that there was an issue with one of the columns that was not importing properly the numeric values from the original file, so it is likely that this was the reason that the Imputer didn't work. OP is taking a look at it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With