I have a dataframe with a 3000+ columns. Many cells in the dataframe are empty strings (' '). Also, I have a lot of numerical values that are are strings but should actually be integers. I wrote two functions to fill all the empty cells with a 0 and where possible change the value to an integer, but when I run them nothing changes to my dataframe. The functions:
def recode_empty_cells(dataframe, list_of_columns):
for column in list_of_columns:
dataframe[column].replace(r'\s+', np.nan, regex=True)
dataframe[column].fillna(0)
return dataframe
def change_string_to_int(dataframe, list_of_columns):
dataframe = recode_empty_cells(dataframe, list_of_columns)
for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass
return dataframe
Note: I'm using a try/except statement because some columns contain text in some form. Thanks in advance for your help.
Edit:
Thanks to your help I got the first part working. All the empty cells have 0s now. This is my code at this moment:
def recode_empty_cells(dataframe, list_of_columns):
for column in list_of_columns:
dataframe[column] = dataframe[column].replace(r'\s+', 0, regex=True)
return dataframe
def change_string_to_int(dataframe, list_of_columns):
dataframe = recode_empty_cells(dataframe, list_of_columns)
for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass
return dataframe
However, this gives me the following error: OverflowError: Python int too large to convert to C long
apply() Method. Another method to replace blank values with NAN is by using DataFrame. apply() method and lambda functions. The apply() method allows you to apply a function along with one of the axis of the DataFrame, default 0, which is the index (row) axis.
Method #1 : Using lambda This task can be performed using the lambda function. In this we check for string for None or empty string using the or operator and replace the empty string with None.
Replace Using Mean, Median, or Mode A common way to replace empty cells, is to calculate the mean, median or mode value of the column.
consider the df
df = pd.DataFrame(dict(A=['2', 'hello'], B=['', '3']))
df
apply
def convert_fill(df):
return df.stack().apply(pd.to_numeric, errors='ignore').fillna(0).unstack()
convert_fill(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With