Imputer reduces the size of columns in my dataframe

Tags:

print(np.shape(ar_fulldata_input_xx))

Output: (9027, 1443)

Now I use Imputer to impute the missing values of my dataframe ar_fulldata_input_xx as follows.

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=0)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(ar_fulldata_input_xx))

Now I check the size of my imputed dataframe as follows.

print(np.shape(imputed_DF))

Output: (9027, 1442)

Why is the column size reduced by one?

Is there any way I can find which column is mixing after impute function??

I have run the following line of code to remove the all columns with entire "NAN" values or entire "0" values.

ar_fulldata_input_xx = ar_fulldata_input_xx.loc[:, (ar_fulldata_input_xx != 0).any(axis=0)]

and

ar_fulldata_input_xx=ar_fulldata_input_xx.dropna(axis=1, how='all')

334

asked Feb 19 '18 02:02

Abdul Karim Khan

1 Answers

You can do it on pandas using this:

ndf = df.fillna(df.mean())

It seems that there was an issue with one of the columns that was not importing properly the numeric values from the original file, so it is likely that this was the reason that the Imputer didn't work. OP is taking a look at it.

128

answered Oct 10 '22 23:10

joaoavf

Related questions
                            
                                Python reshaping based on column names keeping ID & other rows
                            
                                Pandas Interpolate 'time' vs 'linear'
                            
                                How to create business ready reports from jupyter notebooks?
                            
                                Errorbar in Legend - Pandas Bar Plot
                            
                                Unclear why groupby with single group produces row DataFrame
                            
                                Using boolean indexing for row and column MultiIndex in Pandas
                            
                                Using multiple features with scikit-learn
                            
                                Multithreading on numpy/pandas matrix multiplication?
                            
                                How to use pandas query() to correctly reference multiindex column headers in the query expression?
                            
                                Pandas apply to dateframe produces '<built-in method values of ...'
                            
                                Creating a partial SAS PROC SUMMARY replacement in Python/Pandas
                            
                                Multiply two pandas series with mismatched indices
                            
                                Python: l2-Penalty for logistic regression model from statsmodels?
                            
                                Access HDF files stored on s3 in pandas
                            
                                Pandas Dataframe performance vs list performance
                            
                                Pandas using too much memory with read_sql_table
                            
                                Pandas: Concatenating DataFrame with Sparse Matrix
                            
                                Scatter plot label overlaps - matplotlib
                            
                                pandas pivot changes dtype
                            
                                Drop few rows of a pandas dataframe using lambda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Imputer reduces the size of columns in my dataframe

Tags:

pandas

machine-learning

scikit-learn

sklearn-pandas

Abdul Karim Khan

People also ask

1 Answers

joaoavf

Recent Activity

Donate For Us