Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird numpy.sum behavior when adding zeros

Tags:

I understand how mathematically-equivalent arithmentic operations can result in different results due to numerical errors (e.g. summing floats in different orders).

However, it surprises me that adding zeros to sum can change the result. I thought that this always holds for floats, no matter what: x + 0. == x.

Here's an example. I expected all the lines to be exactly zero. Can anybody please explain why this happens?

M = 4  # number of random values Z = 4  # number of additional zeros for i in range(20):     a = np.random.rand(M)     b = np.zeros(M+Z)     b[:M] = a     print a.sum() - b.sum()  -4.4408920985e-16 0.0 0.0 0.0 4.4408920985e-16 0.0 -4.4408920985e-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.22044604925e-16 0.0 4.4408920985e-16 4.4408920985e-16 0.0 

It seems not to happen for smaller values of M and Z.

I also made sure a.dtype==b.dtype.

Here is one more example, which also demonstrates python's builtin sum behaves as expected:

a = np.array([0.1,      1.0/3,      1.0/7,      1.0/13, 1.0/23]) b = np.array([0.1, 0.0, 1.0/3, 0.0, 1.0/7, 0.0, 1.0/13, 1.0/23]) print a.sum() - b.sum() => -1.11022302463e-16 print sum(a) - sum(b) => 0.0 

I'm using numpy V1.9.2.

like image 266
shx2 Avatar asked Jun 23 '15 09:06

shx2


People also ask

Is NP sum faster than for loop?

sum performs faster for np.

What is the use of zeros () function in NumPy?

Python numpy. zeros() function returns a new array of given shape and type, where the element's value as 0.

How does NumPy calculate sum?

sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array. Essentially, this sum ups the elements of an array, takes the elements within a ndarray, and adds them together.

How does NumPy sum axis work?

The way to understand the “axis” of numpy sum is it collapses the specified axis. So when it collapses the axis 0 (row), it becomes just one row and column-wise sum.


1 Answers

Short answer: You are seeing the difference between

a + b + c + d 

and

(a + b) + (c + d) 

which because of floating point inaccuracies is not the same.

Long answer: Numpy implements pair-wise summation as an optimization of both speed (it allows for easier vectorization) and rounding error.

The numpy sum-implementation can be found here (function pairwise_sum_@TYPE@). It essentially does the following:

  1. If the length of the array is less than 8, a regular for-loop summation is performed. This is why the strange result is not observed if W < 4 in your case - the same for-loop summation will be used in both cases.
  2. If the length is between 8 and 128, it accumulates the sums in 8 bins r[0]-r[7] then sums them by ((r[0] + r[1]) + (r[2] + r[3])) + ((r[4] + r[5]) + (r[6] + r[7])).
  3. Otherwise, it recursively sums two halves of the array.

Therefore, in the first case you get a.sum() = a[0] + a[1] + a[2] + a[3] and in the second case b.sum() = (a[0] + a[1]) + (a[2] + a[3]) which leads to a.sum() - b.sum() != 0.

like image 69
jornb87 Avatar answered Sep 28 '22 06:09

jornb87