Precision of numpy array lost after tolist

Tags:

numpy

I have a numpy array in which every number has a certain designated precision(using around(x,1).

[[     3.   15294.7  32977.7   4419.5    978.4    504.4    123.6]
 [     4.   14173.8  31487.2   3853.9    967.8    410.2    107.1]
 [     5.   15323.5  34754.5   3738.7   1034.7    376.1    105.5]
 [     6.   17396.7  41164.5   3787.4   1103.2    363.9    109.4]
 [     7.   19665.5  48967.6   3900.9   1161.     362.1    115.8]
 [     8.   21839.8  56922.5   4037.4   1208.2    365.9    123.5]
 [     9.   23840.6  64573.8   4178.1   1247.     373.2    131.9]
 [    10.   25659.9  71800.2   4314.8   1279.5    382.7    140.5]
 [    11.   27310.3  78577.7   4444.3   1307.1    393.7    149.1]
 [    12.   28809.1  84910.4   4565.8   1331.     405.5    157.4]]

I'm trying to convert every number into a string so that I can write them into a word table using python-docx. But the result of tolist() function is a total mess. The precision of the numbers are lost, resulting very long output.

[['3.0',
  '15294.7001953',
  '32977.6992188',
  '4419.5',
  '978.400024414',
  '504.399993896',
  '123.599998474'],
 ['4.0',
  '14173.7998047',
  '31487.1992188',
  '3853.89990234',
  '967.799987793',
  '410.200012207',
  '107.099998474'],
.......

Besides the tolist() function, I also tried [[str(e) for e in a] for a in m]. The result is the same. This is very annoying. How can I convert to string easily while maintaining the precision? Thanks!

993

asked Dec 08 '13 14:12

sanqiang

2 Answers

Something goes wrong on your conversion to strings. With just numbers:

>>> import numpy as np
>>> a = np.random.random(10)*30
>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])
>>> 
>>> b = np.round(a, decimals=1)
>>> b
array([ 27.3,  10.3,  19.7,  23.9,  29.1,  25.7,  11.9,   5.4,  18.2,  14.1])
>>> b.tolist()
[27.3, 10.3, 19.7, 23.9, 29.1, 25.7, 11.9, 5.4, 18.2, 14.1]

Notice that np.round does not work in-place:

>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])

If all you need is to convert numbers to strings:

>>> " ".join(str(_) for _ in np.round(a, 1)) 
'27.3 10.3 19.7 23.9 29.1 25.7 11.9 5.4 18.2 14.1'

EDIT: Apparently,np.round does not play nice with float32 (other answers give reasons for this). A simple workaround is to cast your array explicitly to either np.float or np.float64 or just float:

>>> # prepare an array of float32 values
>>> a32  = (np.random.random(10) * 30).astype(np.float32)
>>> a32.dtype
dtype('float32')
>>> 
>>> # notice the use of .astype(np.float32)
>>> np.round(a32.astype(np.float64), 1)
array([  5.5,   8.2,  29.8,   8.6,  15.5,  28.3,   2. ,  24.5,  18.4,   8.3])
>>>

EDIT2: As demonstrated by Warren in his answer, string formatting actually rounds things properly (try "%.1f" % (4.79,)). Thus there's no need to cast between float types. I'll leave my answer mainly as a reminder that using np.around is not the right thing to do in these circumstances.

106

answered Oct 02 '22 00:10

ev-br

The precision is not being "lost"; you never had the precision in the first place. The value 15294.7 can not be represented exactly with single precision (i.e. np.float32); the best approximation is 15294.70019...:

In [1]: x = np.array([15294.7], dtype=np.float32)

In [2]: x
Out[2]: array([ 15294.70019531], dtype=float32)

See http://floating-point-gui.de/

Using np.float64 gives you a better approximation, but it still can not represent 15294.7 exactly.

If you want text output that is formatted with a single decimal digit, use a function designed for formatted text output, such as np.savetxt:

In [56]: x = np.array([[15294.7, 32977.7],[14173.8, 31487.2]], dtype=np.float32) 

In [57]: x
Out[57]: 
array([[ 15294.70019531,  32977.69921875],
       [ 14173.79980469,  31487.19921875]], dtype=float32)

In [58]: np.savetxt("data.txt", x, fmt="%.1f", delimiter=",")

In [59]: !cat data.txt
15294.7,32977.7
14173.8,31487.2

If you really need a numpy array of nicely formatted strings, you could do something like this:

In [63]: def myfmt(r):
   ....:     return "%.1f" % (r,)
   ....: 

In [64]: vecfmt = np.vectorize(myfmt)

In [65]: vecfmt(x)
Out[65]: 
array([['15294.7', '32977.7'],
       ['14173.8', '31487.2']], 
      dtype='|S64')

If you use either of those methods, there is no need to pass the data through around first; rounding will occur as part of the formating process.

answered Oct 02 '22 02:10

Warren Weckesser

Related questions
                            
                                Python: Differences between lists and numpy array of objects
                            
                                numpy: "size" vs. "shape" in function arguments?
                            
                                Docstring format for input numpy arrays with mandatory datatype and dimensions
                            
                                Numpy type hints in Python (PEP 484)
                            
                                Alternative for r's Exponential smoothing state space model in python/scikit/numpy
                            
                                numpy array that is (n,1) and (n,)
                            
                                The PyData Ecosystem
                            
                                NumPy performance: uint8 vs. float and multiplication vs. division?
                            
                                What is the pandas.Panel deprecation warning actually recommending?
                            
                                How to pass a numpy array of string types to a function in Cython
                            
                                How to read data into TensorFlow batches from example queue?
                            
                                aws - "Unable to import module 'process': /var/task/numpy/core/multiarray.so: invalid ELF header"
                            
                                Append a 1d array to a 2d array in Numpy Python
                            
                                numpy difference between flat and ravel()
                            
                                Voronoi - Compute exact boundaries of every region
                            
                                Evaluate all pair combinations of rows of two tensors in tensorflow
                            
                                How to load an image and show the image using keras?
                            
                                Can i set float128 as the standard float-array in numpy
                            
                                numpy.shape gives inconsistent responses - why?
                            
                                Why does numpy.r_ use brackets instead of parentheses?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With