Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0

Tags:

python

pandas

I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9   NULL   NULL NULL  8   NULL   NULL Lemon    12  NULL

I want it to look like this,

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9          0          8          0    Lemon    12  

I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.

Edit: Types from Smaller Dataset,

Field1              object
Field2              object
Field3              object
Field4              object
Field5              object
Field6              object
Field7              object
Field8              object
Field9              object
Field10              float64
Field11              float64
Field12              float64
Field13              float64
Field14              float64
Field15              object
Field16              float64
Field17              object
Field18              object
Field19              float64
Field20              float64
Field21              int64
like image 205
HMan06 Avatar asked Oct 18 '18 12:10

HMan06


People also ask

How do you replace NaN with 0 in Python?

Use the DataFrame. fillna(0) method to replace NaN/None values with the 0 value. It doesn't change the object data but returns a new DataFrame.

How do you replace null values in a DataFrame in Python?

The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

How do you replace a blank string in a DataFrame?

You can replace blank/empty values with DataFrame. replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.


2 Answers

Use DataFrame.select_dtypes for numeric columns, filter by subset and replace values to 0, then repalce all another columns to empty string:

print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9   NaN  NaN  NaN  8   NaN  NaN   Lemon  12    NaN

print (df.dtypes)
0      int64
1     object
2    float64
3     object
4      int64
5     object
6    float64
7     object
8      int64
9     object
dtype: object

c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       

Another solution is create dictionary for replace:

num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")

d  = {**d1,  **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}

df = df.fillna(d)
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       
like image 53
jezrael Avatar answered Oct 24 '22 21:10

jezrael


You could try this to substitute a different value for each different column (A to C are numeric, while D is a string):

import pandas as pd
import numpy as np

df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
        [3, 4, np.nan, '1'],
        [np.nan, np.nan, np.nan, '5'],
        [np.nan, 3, np.nan, np.nan]],
        columns=list('ABCD'))

df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})
like image 43
Andrea Avatar answered Oct 24 '22 20:10

Andrea