Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0

Tags:

I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9   NULL   NULL NULL  8   NULL   NULL Lemon    12  NULL

I want it to look like this,

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9          0          8          0    Lemon    12

I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.

Edit: Types from Smaller Dataset,

Field1              object
Field2              object
Field3              object
Field4              object
Field5              object
Field6              object
Field7              object
Field8              object
Field9              object
Field10              float64
Field11              float64
Field12              float64
Field13              float64
Field14              float64
Field15              object
Field16              float64
Field17              object
Field18              object
Field19              float64
Field20              float64
Field21              int64

205

asked Oct 18 '18 12:10

HMan06

2 Answers

Use DataFrame.select_dtypes for numeric columns, filter by subset and replace values to 0, then repalce all another columns to empty string:

print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9   NaN  NaN  NaN  8   NaN  NaN   Lemon  12    NaN

print (df.dtypes)
0      int64
1     object
2    float64
3     object
4      int64
5     object
6    float64
7     object
8      int64
9     object
dtype: object

c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12

Another solution is create dictionary for replace:

num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")

d  = {**d1,  **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}

df = df.fillna(d)
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12

answered Oct 24 '22 21:10

jezrael

You could try this to substitute a different value for each different column (A to C are numeric, while D is a string):

import pandas as pd
import numpy as np

df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
        [3, 4, np.nan, '1'],
        [np.nan, np.nan, np.nan, '5'],
        [np.nan, 3, np.nan, np.nan]],
        columns=list('ABCD'))

df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})

answered Oct 24 '22 20:10

Andrea

Related questions
                            
                                Drawing on top of image in PyQt5 tracing the mouse
                            
                                Failed to upload file with the TypeError : expected str, bytes or os.PathLike object, not list
                            
                                Using .iterrows() with series.nlargest() to get the highest number in a row in a Dataframe
                            
                                Lemmatize a doc with spacy?
                            
                                Cant get pyperclip to use copy and paste modules on python3
                            
                                manually open context manager
                            
                                Seaborn plot two data sets on the same scatter plot
                            
                                How to test command line applications in Python?
                            
                                Pandas compare 1 columns values to another dataframe column, find matching rows
                            
                                How to change the order of keys in a Python 3.5 dictionary, using another list as a reference for keys?
                            
                                Dependencies missing in current linux-64 channels when trying to install tensorflow-gpu with conda command
                            
                                connection pool exhausted psycopg2
                            
                                How can I install a python package onto Google Dataflow and import it into my pipeline?
                            
                                What is arguments[0] while invoking execute_script() method through WebDriver instance through Selenium and Python?
                            
                                Covert a Pandas Dataframe to Dictionary
                            
                                '_sre.SRE_Match' object is not subscriptable
                            
                                ttk.Spinbox missing in tkinter.ttk?
                            
                                Pandas: assign value depending on another dataframe
                            
                                Python-docx: Is it possible to add a new run to paragraph in a specific place (not at the end)
                            
                                How to map key to multiple values to dataframe column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0

Tags:

python

pandas

HMan06

People also ask

2 Answers

jezrael

Andrea

Recent Activity

Donate For Us