Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OverflowError occurs when using cython with a large int

python 3.4, windows 10, cython 0.21.1

I'm compiling this function to c with cython

def weakchecksum(data):
   """
   Generates a weak checksum from an iterable set of bytes.
   """
   cdef long a, b, l
   a = b = 0
   l = len(data)
   for i in range(l):
       a += data[i]
       b += (l - i)*data[i]

   return (b << 16) | a, a, b

which produces this error: "OverflowError: Python int too large to convert to C long"

I've also tried declaring them as unsigned longs. What type do I use to work with really large numbers? If it's too large for a c long are there any workarounds?

like image 223
user2682863 Avatar asked Nov 12 '14 01:11

user2682863


2 Answers

cython compiles pyx files to C, thus it depends on underlying C compiler.

Size of integer types in C varies on different platforms and operations systems, and C standard don't dictate exact implementation.

However there is de facto implementation conventions.

Windows for both 32 and 64 bit uses 4 bytes (32 bits) for int and long, 8 bytes (64 bits) for long long. The difference between Win32 and Win64 is size of pointer (32 bits for Win32 and 64 bits for Win64). (See Data Type Ranges] from MSDN).

Linux uses another model: int is 32 bits for both linux-32 and linux-64, long long is always 64-bit. long and pointers are vary: 32 bits on linux-32 and 64 bits on linux-64.

Long story short: if you need maximum capacity for integer type which doesn't changed on different platforms use long long (or unsigned long long).

The data range for long long is [–9223372036854775808, 9223372036854775807].

If you need numbers with arbitrary precision there is GMP library -- de facto standard for high-precision arithmetic. Python has wrapper for it called gmpy2.

like image 69
Andrew Svetlov Avatar answered Oct 19 '22 14:10

Andrew Svetlov


If you make sure that your calculations are in c (for instance, declare i to be long, and put the data element into a cdefed variable or cast it before calculation), you won't get this error. Your actual results, though, could vary depending on platform, depending (potentially) on the exact assembly code generated and the resulting treatment of overflows. There are better algorithms for this, as @cod3monk3y has noted (look at the "simple checksums" link).

like image 37
shaunc Avatar answered Oct 19 '22 15:10

shaunc