I am new to python and numpy so please excuse me if this problem is so rudimentary! I have an array of negative values (it is sorted):
>>>neg
[ -1.53507843e+02 -1.53200012e+02 -1.43161987e+02 ..., -6.37326136e-1 -3.97518490e-10 -3.73480691e-10]
>>>neg.shape
(12922508,)
I need to add this array to its duplicate (but with positive values) to find the standard deviation of the distribution averaged to zero. So I do the following:
>>>pos=-1*neg
>>>pos=pos[::-1] #Just to make it look symmetric for the display bellow!
>>>total=np.hstack((neg,pos))
>>>total
[-153.50784302 -153.20001221 -143.1619873 ..., 143.1619873 153.20001221 153.50784302]
>>>total.shape
(25845016,)
So far everything is very good, but the strange thing is that the sum of this new array is not zero:
>>>numpy.sum(total)
11610.6
The standard deviation is also not at all near what I was expecting but I guess the root of that problem is the same as this: Why doesn't the sum result in zero?
When I apply this method to a small array; for example [-5, -3, -2] the sum becomes zero. So I guess the problem lies in the length of the array (over 20million elements). Is there any way to deal with this problem?
If any one could help me on this I would be most grateful.
As noted in the comments, you get float roundoff problems from summing up many millions of equal-signed numbers. One possible way around this could be to mix positive and negative numbers in the combined array, so that any intermediate results while summing up always stay roughly within the same order of magnitude:
neg = -100*numpy.random.rand(20e6)
pos = -neg
combined = numpy.zeros(len(neg)+len(pos))
combined[::2] = neg
combined[1::2] = pos
Now combined.sum()
should be pretty close to zero.
Maybe this approach will also help to improve the precision in the computation of the standard deviation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With