Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get modulo from two 4x64bit integer arrays

I use OpenCL for GPGPU programming, but unfortunately there's no native 256 bit integer support. I decided to have 256 bit integer splitted in four 64bit integers. Pretty good solution for basic operations, but how can I get modulo of them?

I need to do this:

(uint256) % (uint256)

But with OpenCL, I can only have this:

[ (uint64), (uint64), (uint64), (uint64) ] % [ (uint64), (uint64), (uint64), (uint64) ]

So how can I achieve that? What algorithm should I use, and the most important - what is the easiest to implement?

P.S. I need that for public key cryptography.

EDIT: I don't have neither addition nor subtraction implemented.

like image 751
Alex S. Avatar asked Jun 06 '20 18:06

Alex S.


1 Answers

Here's an easy (and fairly efficient) algorithm that computes the a % b using only subtraction, multiplication by 2, division by 2 and comparison (all of them easy to implement for your uint256).

uint256 modulo(uint256 a, uint256 b) {
  int i = 0;
  while (b <= a) {
    b = b * 2; // watch out for overflow!
    i++;
  }
  while (i--) {
    b = b / 2;
    if (b <= a) {
      a = a - b;
    }
  }
  return a;
}

Here's an example:

start: a = 40, b = 7
i = 1, a = 40, b = 14
i = 2, a = 40, b = 28
i = 3, a = 40, b = 56

i = 3, b = 28, a = 40 - 28 = 12
i = 2, b = 14, a = 12 (b > a so nothing happens)
i = 1, b = 7, a = 12 - 7 = 5
i = 0, so we stop and return a = 5

EDIT: Why this works? The naive way of computing the modulo residue if the following:

int modulo(int a, int b) {
  while (a >= b) {
    a -= b;
  }
  return a;
}

The proposed solution uses the same idea, but in a more efficient way. We know that we will end up with subtracting b from a exactly k times. By we don't know the value of k. k can be represented in binary as 2^0 * k_0 + 2^1 * k_1 + 2^2 * k_2 + .... The algorithm goes from the biggest values of 2^i and tries to subtract 2^i * b. Thanks to that we achieve logarithmic time complexity instead of linear.

Disclaimer: I Wouldn't use this implementation is real cryptography implementation as it's prone to side channel attacks (different execution time depending on the input).

like image 63
Igor Avatar answered Oct 22 '22 22:10

Igor