Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace empty cells with 0 and change strings to integers where possible in a pandas dataframe?

Tags:

python

pandas

I have a dataframe with a 3000+ columns. Many cells in the dataframe are empty strings (' '). Also, I have a lot of numerical values that are are strings but should actually be integers. I wrote two functions to fill all the empty cells with a 0 and where possible change the value to an integer, but when I run them nothing changes to my dataframe. The functions:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
        dataframe[column].replace(r'\s+', np.nan, regex=True)
        dataframe[column].fillna(0)

    return dataframe

def change_string_to_int(dataframe, list_of_columns):

    dataframe = recode_empty_cells(dataframe, list_of_columns)

    for column in list_of_columns:
        try:
            dataframe[column] = dataframe[column].astype(int)
        except ValueError:
            pass

    return dataframe

Note: I'm using a try/except statement because some columns contain text in some form. Thanks in advance for your help.

Edit:

Thanks to your help I got the first part working. All the empty cells have 0s now. This is my code at this moment:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
        dataframe[column] = dataframe[column].replace(r'\s+', 0, regex=True)

    return dataframe

def change_string_to_int(dataframe, list_of_columns):

    dataframe = recode_empty_cells(dataframe, list_of_columns)

    for column in list_of_columns:
        try:
            dataframe[column] = dataframe[column].astype(int)
        except ValueError:
            pass

    return dataframe

However, this gives me the following error: OverflowError: Python int too large to convert to C long

like image 817
RF_PY Avatar asked Nov 10 '16 15:11

RF_PY


People also ask

How do you replace blank cells with 0 in Python?

apply() Method. Another method to replace blank values with NAN is by using DataFrame. apply() method and lambda functions. The apply() method allows you to apply a function along with one of the axis of the DataFrame, default 0, which is the index (row) axis.

How do you replace an empty string in Python?

Method #1 : Using lambda This task can be performed using the lambda function. In this we check for string for None or empty string using the or operator and replace the empty string with None.

Which method allows us to replace empty cells with value?

Replace Using Mean, Median, or Mode A common way to replace empty cells, is to calculate the mean, median or mode value of the column.


1 Answers

consider the df

df = pd.DataFrame(dict(A=['2', 'hello'], B=['', '3']))
df

enter image description here


apply

def convert_fill(df):
    return df.stack().apply(pd.to_numeric, errors='ignore').fillna(0).unstack()

convert_fill(df)

enter image description here

like image 62
piRSquared Avatar answered Oct 21 '22 22:10

piRSquared