Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Precision of numpy array lost after tolist

Tags:

numpy

I have a numpy array in which every number has a certain designated precision(using around(x,1).

[[     3.   15294.7  32977.7   4419.5    978.4    504.4    123.6]
 [     4.   14173.8  31487.2   3853.9    967.8    410.2    107.1]
 [     5.   15323.5  34754.5   3738.7   1034.7    376.1    105.5]
 [     6.   17396.7  41164.5   3787.4   1103.2    363.9    109.4]
 [     7.   19665.5  48967.6   3900.9   1161.     362.1    115.8]
 [     8.   21839.8  56922.5   4037.4   1208.2    365.9    123.5]
 [     9.   23840.6  64573.8   4178.1   1247.     373.2    131.9]
 [    10.   25659.9  71800.2   4314.8   1279.5    382.7    140.5]
 [    11.   27310.3  78577.7   4444.3   1307.1    393.7    149.1]
 [    12.   28809.1  84910.4   4565.8   1331.     405.5    157.4]]

I'm trying to convert every number into a string so that I can write them into a word table using python-docx. But the result of tolist() function is a total mess. The precision of the numbers are lost, resulting very long output.

[['3.0',
  '15294.7001953',
  '32977.6992188',
  '4419.5',
  '978.400024414',
  '504.399993896',
  '123.599998474'],
 ['4.0',
  '14173.7998047',
  '31487.1992188',
  '3853.89990234',
  '967.799987793',
  '410.200012207',
  '107.099998474'],
.......

Besides the tolist() function, I also tried [[str(e) for e in a] for a in m]. The result is the same. This is very annoying. How can I convert to string easily while maintaining the precision? Thanks!

like image 993
sanqiang Avatar asked Dec 08 '13 14:12

sanqiang


People also ask

How precise is NumPy?

To display each entry in the array with precise digits of precision, call numpy. set_printoptions (precision=None, suppress=None). Set suppress to True to disable scientific notation when it is presented. NumPy uses up to 8 digits of precision by default, and scientific notation is not suppressed.

Is NumPy array memory efficient?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.


2 Answers

Something goes wrong on your conversion to strings. With just numbers:

>>> import numpy as np
>>> a = np.random.random(10)*30
>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])
>>> 
>>> b = np.round(a, decimals=1)
>>> b
array([ 27.3,  10.3,  19.7,  23.9,  29.1,  25.7,  11.9,   5.4,  18.2,  14.1])
>>> b.tolist()
[27.3, 10.3, 19.7, 23.9, 29.1, 25.7, 11.9, 5.4, 18.2, 14.1]

Notice that np.round does not work in-place:

>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])

If all you need is to convert numbers to strings:

>>> " ".join(str(_) for _ in np.round(a, 1)) 
'27.3 10.3 19.7 23.9 29.1 25.7 11.9 5.4 18.2 14.1'

EDIT: Apparently,np.round does not play nice with float32 (other answers give reasons for this). A simple workaround is to cast your array explicitly to either np.float or np.float64 or just float:

>>> # prepare an array of float32 values
>>> a32  = (np.random.random(10) * 30).astype(np.float32)
>>> a32.dtype
dtype('float32')
>>> 
>>> # notice the use of .astype(np.float32)
>>> np.round(a32.astype(np.float64), 1)
array([  5.5,   8.2,  29.8,   8.6,  15.5,  28.3,   2. ,  24.5,  18.4,   8.3])
>>> 

EDIT2: As demonstrated by Warren in his answer, string formatting actually rounds things properly (try "%.1f" % (4.79,)). Thus there's no need to cast between float types. I'll leave my answer mainly as a reminder that using np.around is not the right thing to do in these circumstances.

like image 106
ev-br Avatar answered Oct 02 '22 00:10

ev-br


The precision is not being "lost"; you never had the precision in the first place. The value 15294.7 can not be represented exactly with single precision (i.e. np.float32); the best approximation is 15294.70019...:

In [1]: x = np.array([15294.7], dtype=np.float32)

In [2]: x
Out[2]: array([ 15294.70019531], dtype=float32)

See http://floating-point-gui.de/

Using np.float64 gives you a better approximation, but it still can not represent 15294.7 exactly.

If you want text output that is formatted with a single decimal digit, use a function designed for formatted text output, such as np.savetxt:

In [56]: x = np.array([[15294.7, 32977.7],[14173.8, 31487.2]], dtype=np.float32) 

In [57]: x
Out[57]: 
array([[ 15294.70019531,  32977.69921875],
       [ 14173.79980469,  31487.19921875]], dtype=float32)

In [58]: np.savetxt("data.txt", x, fmt="%.1f", delimiter=",")

In [59]: !cat data.txt
15294.7,32977.7
14173.8,31487.2

If you really need a numpy array of nicely formatted strings, you could do something like this:

In [63]: def myfmt(r):
   ....:     return "%.1f" % (r,)
   ....: 

In [64]: vecfmt = np.vectorize(myfmt)

In [65]: vecfmt(x)
Out[65]: 
array([['15294.7', '32977.7'],
       ['14173.8', '31487.2']], 
      dtype='|S64')

If you use either of those methods, there is no need to pass the data through around first; rounding will occur as part of the formating process.

like image 31
Warren Weckesser Avatar answered Oct 02 '22 02:10

Warren Weckesser