Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace NULL Values in columns with Special characters in pandas

I have a data frame with column name as below:

Column (Name)     Column Name 2   Column3   Column (4)
NULL                 NULL             C3       100
22                    C44            C55       NULL
2                      C5            C11       13

I wish to replace null values from a subset say Column (Name) and Column (4) with mean and min values. How to do this ? Values in Column (Name) and Column (4) are numeric

 df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
 df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

I get below error:

TypeError: can only concatenate str (not "int") to str

Expected output:

 Column (Name)     Column Name 2   Column3   Column (4)
    12                 NULL            C3        100
    22                  C44           C55        13
    2                    C5              C11       13
like image 998
noob Avatar asked Dec 20 '25 12:12

noob


2 Answers

Your error means there are some non numeric values in column.

Test if columns are numeric, if not convert them to df.dtypes:

print(df.dtypes)

Then you can test what values are wrong:

print (df.loc[pd.to_numeric(df['Column (Name)'], errors='coerce').isna(), 'Column (Name)'])

And last convert to numeric:

df['Column (Name)'] = pd.to_numeric(df['Column (Name)'], errors='coerce')
df['Column (4)'] = pd.to_numeric(df['Column (4)'], errors='coerce')

Or if want convert multiple columns:

cols = ['Column (Name)','Column (4)']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

And then use your solution:

df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

Or you can use DataFrame.agg:

df = df.fillna(df.agg({'Column (Name)':'mean', 'Column (4)':'min'}))
print (df)
   Column (Name) Column Name 2 Column3  Column (4)
0           12.0           NaN      C3       100.0
1           22.0           C44     C55        13.0
2            2.0            C5     C11        13.0
like image 141
jezrael Avatar answered Dec 22 '25 02:12

jezrael


Actually using your code I have no error. Please compare with my code the dtypes.

import io
import pandas as pd

Reading your data.

df = pd.read_csv(io.StringIO("""
Column (Name)     Column Name 2   Column3   Column (4)
NULL                 NULL             C3       100
22                    C44            C55       NULL
2                      C5            C11       13
"""), sep="\s\s+", engine="python")

Check the data types.

df.dtypes

Column (Name)    float64
Column Name 2     object
Column3           object
Column (4)       float64
dtype: object

The code to fill-in mean and min.

df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

Filled-in values are 12.0 and 13.0.

like image 33
Ruthger Righart Avatar answered Dec 22 '25 03:12

Ruthger Righart



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!