Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - How to replace string with zero values in a DataFrame series?

I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?

  • Steve
like image 735
Steve Maughan Avatar asked Oct 30 '15 16:10

Steve Maughan


People also ask

How do I change the values in pandas series based on conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I replace a string in a data frame?

You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.


1 Answers

You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs

From the docs:

convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.

In [17]: df
Out[17]: 
    a   b  c
0  1.  2.  4
1  sd  2.  4
2  1.  fg  5

In [18]: df2 = df.convert_objects(convert_numeric=True)

In [19]: df2
Out[19]: 
    a   b  c
0   1   2  4
1 NaN   2  4
2   1 NaN  5

Finally, if you want to convert those NaNs to 0's, you can use df.replace

In [20]: df2.replace('NaN',0)
Out[20]: 
   a  b  c
0  1  2  4
1  0  2  4
2  1  0  5
like image 115
tmdavison Avatar answered Nov 15 '22 02:11

tmdavison