I have a dictionary that I want to write to a csv file, but the floats in the dictionary are rounded off when I write them to the file. I want to keep the maximum precision. Where does the rounding occur and how can I prevent it? <h3>What I did</h3> I followed the DictWriter example here and I'm running Python 2.6.1 on Mac (10.6 - Snow Leopard). <hr> <pre class="prettyprint"><code># my import statements import sys import csv </code></pre> Here is what my dictionary (d) contains: <pre class="prettyprint"><code>>>> d = runtime.__dict__ >>> d {'time_final': 1323494016.8556759, 'time_init': 1323493818.0042379, 'time_lapsed': 198.85143804550171} </code></pre> The values are indeed floats: <pre class="prettyprint"><code>>>> type(runtime.time_init) <type 'float'> </code></pre> Then I setup my writer and write the header and values: <pre class="prettyprint"><code>f = open(log_filename,'w') fieldnames = ('time_init', 'time_final', 'time_lapsed') myWriter = csv.DictWriter(f, fieldnames=fieldnames) headers = dict( (n,n) for n in fieldnames ) myWriter.writerow(headers) myWriter.writerow(d) f.close() </code></pre> But when I look in the output file, I get rounded numbers (i.e., floats): <pre class="prettyprint"><code>time_init,time_final,time_lapsed 1323493818.0,1323494016.86,198.851438046 </code></pre> < EOF >

It's a known bug^H^H^Hfeature. According to the docs: """... the value None is written as the empty string. [snip] All other non-string data are stringified with str() before being written.""" Don't rely on the default conversions. Use <code>repr()</code> for floats. <code>unicode</code> objects need special handling; see the manual. Check whether the consumer of the file will accept the default format of <code>datetime.x</code> objects for x in (datetime, date, time, timedelta). Update: For float objects, <code>"%f" % value</code> is not a good substitute for <code>repr(value)</code>. The criterion is whether the consumer of the file can reproduce the original float object. <code>repr(value)</code> guarantees this. <code>"%f" % value</code> doesn't. <pre class="prettyprint"><code># Python 2.6.6 >>> nums = [1323494016.855676, 1323493818.004238, 198.8514380455017, 1.0 / 3] >>> for v in nums: ... rv = repr(v) ... fv = "%f" % v ... sv = str(v) ... print rv, float(rv) == v, fv, float(fv) == v, sv, float(sv) == v ... 1323494016.8556759 True 1323494016.855676 True 1323494016.86 False 1323493818.0042379 True 1323493818.004238 True 1323493818.0 False 198.85143804550171 True 198.851438 False 198.851438046 False 0.33333333333333331 True 0.333333 False 0.333333333333 False </code></pre> Notice that in the above, it appears by inspection of the strings produced that none of the <code>%f</code> cases worked. Before 2.7, Python's <code>repr</code> always used 17 significant decimal digits. In 2.7, this was changed to using the minimum number of digits that still guaranteed <code>float(repr(v)) == v</code>. The difference is not a rounding error. <pre class="prettyprint"><code># Python 2.7 output 1323494016.855676 True 1323494016.855676 True 1323494016.86 False 1323493818.004238 True 1323493818.004238 True 1323493818.0 False 198.8514380455017 True 198.851438 False 198.851438046 False 0.3333333333333333 True 0.333333 False 0.333333333333 False </code></pre> Note the improved <code>repr()</code> results in the first column above. Update 2 in response to comment """And thanks for the info on Python 2.7. Unfortunately, I'm limited to 2.6.2 (running on the destination machine which can't be upgraded). But I'll keep this in mind for future scripts. """ It doesn't matter. <code>float('0.3333333333333333') == float('0.33333333333333331')</code> produces <code>True</code> on all versions of Python. This means that you could write your file on 2.7 and it would read the same on 2.6, or vice versa. There is no change in the accuracy of what <code>repr(a_float_object)</code> produces.

How can I prevent csv.DictWriter() or writerow() rounding my floats?

What I did

I followed the DictWriter example here and I'm running Python 2.6.1 on Mac (10.6 - Snow Leopard).

# my import statements
import sys
import csv

Here is what my dictionary (d) contains:

>>> d = runtime.__dict__
>>> d
{'time_final': 1323494016.8556759,
'time_init': 1323493818.0042379,
'time_lapsed': 198.85143804550171}

The values are indeed floats:

>>> type(runtime.time_init)
<type 'float'>

Then I setup my writer and write the header and values:

f = open(log_filename,'w')
fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(f, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)
f.close()

But when I look in the output file, I get rounded numbers (i.e., floats):

time_init,time_final,time_lapsed
1323493818.0,1323494016.86,198.851438046

< EOF >

366

asked Dec 10 '11 08:12

aDroid

2 Answers

It looks like csv is using float.__str__ rather than float.__repr__:

>>> print repr(1323494016.855676)
1323494016.855676
>>> print str(1323494016.855676)
1323494016.86

Looking at the csv source, this appears to be a hardwired behavior. A workaround is to cast all of the float values to their repr before csv gets to it. Use something like: d = dict((k, repr(v)) for k, v in d.items()).

Here's a worked-out example:

import sys, csv

d = {'time_final': 1323494016.8556759,
     'time_init': 1323493818.0042379,
     'time_lapsed': 198.85143804550171
}

d = dict((k, repr(v)) for k, v in d.items())

fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(sys.stdout, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)

This code produces the following output:

time_init,time_final,time_lapsed
1323493818.0042379,1323494016.8556759,198.85143804550171

A more refined approach will take care to only make replacements for floats:

d = dict((k, (repr(v) if isinstance(v, float) else str(v))) for k, v in d.items())

Note, I've just fixed this issue for Py2.7.3, so it shouldn't be a problem in the future. See http://hg.python.org/cpython/rev/bf7329190ca6

172

answered Oct 12 '22 23:10

Raymond Hettinger

It's a known bug^H^H^Hfeature. According to the docs:

"""... the value None is written as the empty string. [snip] All other non-string data are stringified with str() before being written."""

Don't rely on the default conversions. Use repr() for floats. unicode objects need special handling; see the manual. Check whether the consumer of the file will accept the default format of datetime.x objects for x in (datetime, date, time, timedelta).

Update:

For float objects, "%f" % value is not a good substitute for repr(value). The criterion is whether the consumer of the file can reproduce the original float object. repr(value) guarantees this. "%f" % value doesn't.

# Python 2.6.6
>>> nums = [1323494016.855676, 1323493818.004238, 198.8514380455017, 1.0 / 3]
>>> for v in nums:
...     rv = repr(v)
...     fv = "%f" % v
...     sv = str(v)
...     print rv, float(rv) == v, fv, float(fv) == v, sv, float(sv) == v
...
1323494016.8556759 True 1323494016.855676 True 1323494016.86 False
1323493818.0042379 True 1323493818.004238 True 1323493818.0 False
198.85143804550171 True 198.851438 False 198.851438046 False
0.33333333333333331 True 0.333333 False 0.333333333333 False

Notice that in the above, it appears by inspection of the strings produced that none of the %f cases worked. Before 2.7, Python's repr always used 17 significant decimal digits. In 2.7, this was changed to using the minimum number of digits that still guaranteed float(repr(v)) == v. The difference is not a rounding error.

# Python 2.7 output
1323494016.855676 True 1323494016.855676 True 1323494016.86 False
1323493818.004238 True 1323493818.004238 True 1323493818.0 False
198.8514380455017 True 198.851438 False 198.851438046 False
0.3333333333333333 True 0.333333 False 0.333333333333 False

Note the improved repr() results in the first column above.

Update 2 in response to comment """And thanks for the info on Python 2.7. Unfortunately, I'm limited to 2.6.2 (running on the destination machine which can't be upgraded). But I'll keep this in mind for future scripts. """

It doesn't matter. float('0.3333333333333333') == float('0.33333333333333331') produces True on all versions of Python. This means that you could write your file on 2.7 and it would read the same on 2.6, or vice versa. There is no change in the accuracy of what repr(a_float_object) produces.

answered Oct 12 '22 23:10

John Machin

Related questions
                            
                                Get foreign key objects in a single query
                            
                                Python 'source HOME/.bashrc' with os.system()
                            
                                Is unladen-swallow dead?
                            
                                Reading from a file using pickle and for loop in python
                            
                                How can Python access the X11 clipboard?
                            
                                python argparse with dependencies
                            
                                python: get the abstract syntax tree of imported function?
                            
                                Make subprocess find git executable on Windows
                            
                                What is a cofunction and how would it work in Python?
                            
                                Is there a way to access parent modules in Python
                            
                                Python tools to visualize 100k Vertices and 1M Edges? [closed]
                            
                                Access USB serial ports using Python and pyserial
                            
                                How do I execute an arbitrary script in the context of my Django project?
                            
                                Scoring a string based on how English-like it is
                            
                                Python, Scipy: Building triplets using large adjacency matrix
                            
                                Getting a hyperlink URL from an Excel document
                            
                                does closing a file opened with os.fdopen close the os-level fd?
                            
                                Sorting a dictionary of tuples in Python
                            
                                Can Super deal with multiple inheritance?
                            
                                "Open with..." a file on Windows, with a python application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I prevent csv.DictWriter() or writerow() rounding my floats?

Tags:

python

file-io

floating-point

rounding

csv