Optimising multiplication modulo a small prime

Question

I need to do the following operation many times:

Take two integers a, b
Compute a * b mod p, where p = 1000000007 and a, b are of the same order of magnitude as p

My gut feeling is the naive

result = a * b
result %= p

is inefficient. Can I optimise multiplication modulo p much like exponentiation modulo p is optimised with pow(a, b, p)?

BlueRaja - Danny Pflughoeft · Accepted Answer

You mention that "a, b are of the same order of magnitude as p." Often in cryptography this means that a,b are large numbers near p, but strictly less-than p.

If this is the case, then you could use the simple identity

$a-p \equiv a \pmod{p}$

to turn your calculation into

result = ((a-p)*(b-p))%p

You've then turned one large multiplication into two large subtractions and a small multiplication. You'll have to profile to see which is faster.

hardmath · Answer

To do this calculation in assembly, but have it callable from Python, I'd try inline assembly from a Python module written in C. Both GCC and MSVC compilers feature inline assembly, only with differing syntax.

Note that our modulus p = 1000000007 just fits into 30-bits. The result desired (a*b)%p can be computed in Intel 80x86 registers given some weak restrictions on a,b not being much bigger than p.

Restrictions on size of a,b

(1) a,b are 32-bit unsigned integers

(2) a*b is less than p << 32, i.e. p times 2^32

In particular if a,b are each less than 2*p, overflow will be avoided. Given (1), it also suffices that either one of them is less than p.

The Intel 80x86 instruction MUL can multiply two 32-bit unsigned integers and store the 64-bit result in accumulator register pair EDX:EAX. Some details and quirks of MUL are discussed in Section 10.2.1 of this helpful summary.

The instruction DIV can then divide this 64-bit result by a 32-bit constant (the modulus p), storing the quotient in EAX and the remainder in EDX. See Section 10.2.2 of the last link. The result we want is that remainder.

It is this division instruction DIV that entails a risk of overflow, should the 64-bit product in numerator EDX:EAX give a quotient larger than 32-bits by failing to satisfy (2) above.

I'm working on a code snippet in C/inline assembly for "proof of concept". However the maximum benefit in speed will depend on batching up arrays of data a,b to process, amortizing the overhead of function calls, etc. in Python (if that is the target platform).

Optimising multiplication modulo a small prime

Tags:

python

math

cryptography

Randomblue

2 Answers

BlueRaja - Danny Pflughoeft

hardmath

Recent Activity

Donate For Us

Optimising multiplication modulo a small prime

Tags:

python

math

cryptography

Randomblue

2 Answers

BlueRaja - Danny Pflughoeft

hardmath

Related questions

Recent Activity

Donate For Us