Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy 256-bit computing

Tags:

python

numpy

I have some large numbers that I am using for Elliptic Curve Cryptography. So far I have been doing all the computations in plain python which natively support arbitrary sized numbers by the use of "bignum" integer type. However, I would like to vectorize some of these operations (for performance reasons) and would like to use an array containing 256 bit unsigned integers. But AFAIK the maximum size of an integer in numpy is 64-bits by using np.longlong (or np.ulonglong for the unsigned version). An idea from the following post was to represent each 256-bit number as an array with four 64-bit integers:

large = 105951305240504794066266398962584786593081186897777398483830058739006966285013
arr = np.array([large.to_bytes(32,'little')]).view('P')
arr

Output:

array([18196013122530525909, 15462736877728584896, 12869567647602165677,
       16879016735257358861], dtype=uint64)

And I can convert it back to native python "bignum" by doing:

int.from_bytes(arr.tobytes(), 'little')

Output:

105951305240504794066266398962584786593081186897777398483830058739006966285013

However, I would like to do some operations before converting it back to "bignum":

arr2 = arr + 1
int.from_bytes(arr2.tobytes(), 'little')

Output:

105951305240504794072543500697971467357257258687906003363414235534976478560982

Which is ofcourse not equal to large + 1


Is it possible to handle 256-bit numbers in numpy? If not, does there exist any suitable workarounds?

like image 835
Kevin Avatar asked Apr 11 '26 02:04

Kevin


1 Answers

Let's say you have a couple of arrays of four bytes:

a = np.array([0b0000_1000, 0b1000_1100, 0b0111_1111, 0b1111_0000], dtype=np.uint8)  # [  8, 140, 127, 240]
b = np.array([0b1111_0000, 0b1111_0000, 0b1111_0000, 0b1111_0000], dtype=np.uint8)  # [240, 240, 240, 240]

Now you want to add them up and get something like

c = np.array([0b1111_1001, 0b0111_1101, 0b0111_0000, 0b1110_0000], dtype=np.uint8)  # [249, 125, 112, 224]

I've deliberately avoided something that requires carrying over in the topmost byte. In this case the result was computed with

ai = int.from_bytes(a.tobytes(), byteorder='big')
bi = int.from_bytes(b.tobytes(), byteorder='big')
c = np.frombuffer((ai + bi).to_bytes(4, 'big'), dtype=np.uint8)

So how to do that without converting to and from int? You can't do a naive c = a + b:

[1111_1000, 0111_1100, 0110_1111, 1110_0000]
          +1         +1         +1

Notice that the carries at the marked locations were all dropped. You can check for carries without triggering overflow using

0xFF - a < b

The good news is that you can only ever carry over one bit per byte: 0xFF + 0xFF + 1 == 0x1FF. You also can't carry over from adding 1 if the sum just carried over: (0xFF + 0xFF) & 0xFF == 0xFE. Here is a simple implementation of proper addition:

c = a.copy()
while b.sum():
    carry = np.r_[0xFF - c[1:] < b[1:], False].view(np.uint8)
    c += b
    b = carry

This loop will generally run for 1-2 iterations. In the absolute worst-case, it may run for up to N-1 iterations, where N=256 in your case, or N=4 in this example. Using a larger datatype will mean that you can reduce the number of iterations in the worst case (and probably speed up the best case too). The only caveat that you won't be able to use the ...view(np.uint8) trick to convert your boolean mask. But that's actually a blessing in disguise, because you can write

c = a.copy()
carry = np.zeros_like(a)
cbuf = carry.view(bool)[::a.itemsize]
# On big-endian use this instead:
#cbuf = carry.view(bool)[a.itemsize - 1::a.itemsize]
cbuf = cbuf[:-1]
m = np.iinfo(a.dtype).max
while b.sum():
    c += b
    np.less(m - c[1:], b[1:], out=cbuf)
    b = carry

Notice that this version not only avoids reallocating the carry buffer, but is completely agnostic to the itemsize of a and b.

Exercises for the reader:

  • Handling little-endian order correctly (my examples have the words arranged in big-endian order, unlike the question)
  • Adding sign support (comes out of the box if you have fixed-width twos-complement)
  • Actually vectorizing this operation across N dimensions
  • Implementing operations like multiplication
like image 137
Mad Physicist Avatar answered Apr 12 '26 16:04

Mad Physicist



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!