I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.
1 John 2 Doe 3 Mike 4 Orange 5 Stuff
9 NULL NULL NULL 8 NULL NULL Lemon 12 NULL
I want it to look like this,
1 John 2 Doe 3 Mike 4 Orange 5 Stuff
9 0 8 0 Lemon 12
I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.
Edit: Types from Smaller Dataset,
Field1 object
Field2 object
Field3 object
Field4 object
Field5 object
Field6 object
Field7 object
Field8 object
Field9 object
Field10 float64
Field11 float64
Field12 float64
Field13 float64
Field14 float64
Field15 object
Field16 float64
Field17 object
Field18 object
Field19 float64
Field20 float64
Field21 int64
Use the DataFrame. fillna(0) method to replace NaN/None values with the 0 value. It doesn't change the object data but returns a new DataFrame.
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
You can replace blank/empty values with DataFrame. replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.
Use DataFrame.select_dtypes
for numeric columns, filter by subset and replace values to 0
, then repalce all another columns to empty string:
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 NaN NaN NaN 8 NaN NaN Lemon 12 NaN
print (df.dtypes)
0 int64
1 object
2 float64
3 object
4 int64
5 object
6 float64
7 object
8 int64
9 object
dtype: object
c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
Another solution is create dictionary for replace:
num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")
d = {**d1, **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}
df = df.fillna(d)
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
You could try this to substitute a different value for each different column (A
to C
are numeric, while D
is a string):
import pandas as pd
import numpy as np
df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
[3, 4, np.nan, '1'],
[np.nan, np.nan, np.nan, '5'],
[np.nan, 3, np.nan, np.nan]],
columns=list('ABCD'))
df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With