Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace missing values in all columns except one in pandas dataframe

Tags:

python

pandas

I have a pandas dataframe with 10 columns and I want to fill missing values for all columns except one (lets say that column is called test). Currently, if I do this:

df.fillna(df.median(), inplace=True)

It replaces NA values in all columns with median value, how do I exclude specific column(s) without specifying ALL the other columns

like image 555
user308827 Avatar asked Mar 21 '17 01:03

user308827


2 Answers

you can use pd.DataFrame.drop to help out

df.drop('unwanted_column', 1).fillna(df.median())

Or pd.Index.difference

df.loc[:, df.columns.difference(['unwanted_column'])].fillna(df.median())

Or just

df.loc[:, df.columns != 'unwanted_column']

Input to difference function should be passed as an array (Edited).

like image 137
piRSquared Avatar answered Sep 24 '22 03:09

piRSquared


Just select whatever columns you want using pandas' column indexing:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [np.nan, 5, 2, np.nan, 3], 'B': [np.nan, 4, 3, 5, np.nan], 'C': [np.nan, 4, 3, 2, 1]})
>>> df
     A    B    C
0  NaN  NaN  NaN
1  5.0  4.0  4.0
2  2.0  3.0  3.0
3  NaN  5.0  2.0
4  3.0  NaN  1.0
>>> cols = ['A', 'B']
>>> df[cols] = df[cols].fillna(df[cols].median())
>>> df
     A    B    C
0  3.0  4.0  NaN
1  5.0  4.0  4.0
2  2.0  3.0  3.0
3  3.0  5.0  2.0
4  3.0  4.0  1.0
like image 37
blacksite Avatar answered Sep 23 '22 03:09

blacksite