In summing the first 100,000,000
positive integers using the following:
import numpy as np
np.arange(1,100000001).sum()
I return: 987459712
, which does not match the formula: N(N+1)/2
for N=100000000
. Namely, the formula returns 5000000050000000
.
Before posting, I wrote the following, which returns True
:
np.arange(1,65536).sum() == ((65535+1) * 65535)/2
However, the number 65536
seems to be a critical point, as
np.arange(1,65537).sum() == ((65536+1) * 65536)/2
returns False
.
For integers greater than 65536
the code returns False
, whereas integers below this threshold return True
.
Could someone explain either what I've done wrong in calculating the sum, or what is going on with the code?
Seems like numpy
sometimes has a hard time guessing the correct datatype.
On my system, Win 10 64-bit, Python 3.4.4, numpy 1.13.1:
>> np.arange(1, 100000001).sum()
987459712
>> np.arange(1, 100000001).dtype
dtype('int32')
But, if we "help" numpy
it gets the correct result:
>> np.arange(1, 100000001, dtype=np.int64).sum()
500000005000000
The wrong result is obviously due to 32-bit integer overflowing.
It isn't really that numpy
has a hard time guessing things, it's just that the default int type is the same as C long type:
int_
: Default integer type (same as C long; normally either int64 or int32)
For windows systems, longs are 32bit, even on 64bit builds (see here for more) so that's what's used by default, int32
.
As DeepSpace suggested, setting dtype
to int64
does the trick. This can be done either in arange
or in the sum
method.
Additionally, you wrote:
Before posting, I wrote the following, which returns
True
:
np.arange(1,65536).sum() == ((65535+1) * 65535)/2
However, the number 65536 seems to be a critical point, as
np.arange(1,65537).sum() == ((65536+1) * 65536)/2
returns
False
.
and this is explained by the fact that the second sum exceeds int32
's max value while the first doesn't:
>> np.arange(1,65536).sum() < np.iinfo(np.int32).max
True
>>> np.arange(1,65537).sum() < np.iinfo(np.int32).max
False
of course the Python calculation is correct due to Python 3's arbitrary precision int
s.
This is why many of us weren't able to reproduce. On most Unixes the default int size for 64bit machines is int64
(since the C
long is 64bits) therefore the sum of those ints was equal to the expected value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With