I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats. <pre class="prettyprint"><code>a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']] df=pandas.DataFrame(a) </code></pre> I am guessing I need to use locale.atof. Indeed <pre class="prettyprint"><code>df[0].apply(locale.atof) </code></pre> works as expected. I get a Series of floats. But when I apply it to the DataFrame, I get an error. <pre class="prettyprint"><code>df.apply(locale.atof) </code></pre> <blockquote> TypeError: ("cannot convert the series to ", u'occurred at index 0') </blockquote> and <pre class="prettyprint"><code>df[0:1].apply(locale.atof) </code></pre> gives another error: <blockquote> ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0') </blockquote> So, how do I convert this <code>DataFrame</code> of strings to a DataFrame of floats?

If you're reading in from csv then you can use the thousands arg: <pre class="prettyprint"><code>df.read_csv('foo.tsv', sep='\t', thousands=',') </code></pre> This method is likely to be more efficient than performing the operation as a separate step. <hr> You need to set the locale first: <pre class="prettyprint"><code>In [ 9]: import locale In [10]: from locale import atof In [11]: locale.setlocale(locale.LC_NUMERIC, '') Out[11]: 'en_GB.UTF-8' In [12]: df.applymap(atof) Out[12]: 0 1 0 1200 4200.00 1 7000 -0.03 2 5 0.00 </code></pre>

Convert number strings with commas in pandas DataFrame to float

Tags:

python

pandas

I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']] df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

But when I apply it to the DataFrame, I get an error.

df.apply(locale.atof)

TypeError: ("cannot convert the series to ", u'occurred at index 0')

and

df[0:1].apply(locale.atof)

gives another error:

ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

So, how do I convert this DataFrame of strings to a DataFrame of floats?

390

asked Mar 03 '14 02:03

pheon

1 Answers

If you're reading in from csv then you can use the thousands arg:

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.

You need to set the locale first:

In [ 9]: import locale  In [10]: from locale import atof  In [11]: locale.setlocale(locale.LC_NUMERIC, '') Out[11]: 'en_GB.UTF-8'  In [12]: df.applymap(atof) Out[12]:       0        1 0  1200  4200.00 1  7000    -0.03 2     5     0.00

158

answered Oct 15 '22 10:10

Andy Hayden

Related questions
                            
                                How to create a Python decorator that can be used either with or without parameters?
                            
                                Conditional operator in Python? [duplicate]
                            
                                CORS error on same domain?
                            
                                Get pixel's RGB using PIL
                            
                                How to assign to repeated field?
                            
                                Detect & Record Audio in Python
                            
                                Send data from a textbox into Flask?
                            
                                Python: Append item to list N times
                            
                                How can I use a pip requirements file to uninstall as well as install packages?
                            
                                How to convert a timezone aware string to datetime in Python without dateutil?
                            
                                Run code before and after each test in py.test?
                            
                                Why doesn't requests.get() return? What is the default timeout that requests.get() uses?
                            
                                Counting the number of non-NaN elements in a numpy ndarray in Python
                            
                                How to implement the --verbose or -v option into a script?
                            
                                How to execute ipdb.set_trace() at will while running pytest tests
                            
                                Platform independent path concatenation using "/" , "\"?
                            
                                method of iterating over sqlalchemy model's defined columns?
                            
                                Get an attribute value based on the name attribute with BeautifulSoup
                            
                                Python strip with \n [duplicate]
                            
                                Create a file if it doesn't exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With