I have a data frame with column name as below:
Column (Name) Column Name 2 Column3 Column (4)
NULL NULL C3 100
22 C44 C55 NULL
2 C5 C11 13
I wish to replace null values from a subset say Column (Name) and Column (4) with mean and min values. How to do this ? Values in Column (Name) and Column (4) are numeric
df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())
I get below error:
TypeError: can only concatenate str (not "int") to str
Expected output:
Column (Name) Column Name 2 Column3 Column (4)
12 NULL C3 100
22 C44 C55 13
2 C5 C11 13
Your error means there are some non numeric values in column.
Test if columns are numeric, if not convert them to df.dtypes:
print(df.dtypes)
Then you can test what values are wrong:
print (df.loc[pd.to_numeric(df['Column (Name)'], errors='coerce').isna(), 'Column (Name)'])
And last convert to numeric:
df['Column (Name)'] = pd.to_numeric(df['Column (Name)'], errors='coerce')
df['Column (4)'] = pd.to_numeric(df['Column (4)'], errors='coerce')
Or if want convert multiple columns:
cols = ['Column (Name)','Column (4)']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
And then use your solution:
df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())
Or you can use DataFrame.agg:
df = df.fillna(df.agg({'Column (Name)':'mean', 'Column (4)':'min'}))
print (df)
Column (Name) Column Name 2 Column3 Column (4)
0 12.0 NaN C3 100.0
1 22.0 C44 C55 13.0
2 2.0 C5 C11 13.0
Actually using your code I have no error. Please compare with my code the dtypes.
import io
import pandas as pd
Reading your data.
df = pd.read_csv(io.StringIO("""
Column (Name) Column Name 2 Column3 Column (4)
NULL NULL C3 100
22 C44 C55 NULL
2 C5 C11 13
"""), sep="\s\s+", engine="python")
Check the data types.
df.dtypes
Column (Name) float64
Column Name 2 object
Column3 object
Column (4) float64
dtype: object
The code to fill-in mean and min.
df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())
Filled-in values are 12.0 and 13.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With