I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.
What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.
You can use the convert_objects
method of the DataFrame
, with convert_numeric=True
to change the strings to NaNs
From the docs:
convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.
In [17]: df
Out[17]:
a b c
0 1. 2. 4
1 sd 2. 4
2 1. fg 5
In [18]: df2 = df.convert_objects(convert_numeric=True)
In [19]: df2
Out[19]:
a b c
0 1 2 4
1 NaN 2 4
2 1 NaN 5
Finally, if you want to convert those NaNs
to 0
's, you can use df.replace
In [20]: df2.replace('NaN',0)
Out[20]:
a b c
0 1 2 4
1 0 2 4
2 1 0 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With