Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast modulo 3 or division algorithm?

is there a fast algorithm, similar to power of 2, which can be used with 3, i.e. n%3. Perhaps something that uses the fact that if sum of digits is divisible by three, then the number is also divisible.

This leads to a next question. What is the fast way to add digits in a number? I.e. 37 -> 3 +7 -> 10 I am looking for something that does not have conditionals as those tend to inhibit vectorization

thanks

like image 606
Anycorn Avatar asked Nov 08 '09 17:11

Anycorn


2 Answers

4 % 3 == 1, so (4^k * a + b) % 3 == (a + b) % 3. You can use this fact to evaluate x%3 for a 32-bit x:

x = (x >> 16) + (x & 0xffff);
x = (x >> 10) + (x & 0x3ff);
x = (x >> 6) + (x & 0x3f);
x = (x >> 4) + (x & 0xf);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
if (x == 3) x = 0;

(Untested - you might need a few more reductions.) Is this faster than your hardware can do x%3? If it is, it probably isn't by much.

like image 156
Keith Randall Avatar answered Oct 23 '22 01:10

Keith Randall


This comp.compilers item has a specific recommendation for computing modulo 3.

An alternative, especially if the maximium size of the dividend is modest, is to multiply by the reciprocal of 3 as a fixed-point value, with enough bits of precision to handle the maximum size dividend to compute the quotient, and then subtract 3*quotient from the the dividend to get the remainder. All of these multiplies can be implemented with a fixed sequence of shifts-and-adds. The number of instructions will depend on the bit pattern of the reciprocal. This works pretty well when the dividend max is modest in size.

Regarding adding digits in the number... if you want to add the decimal digits, you're going to end up doing what amounts to a number-conversion-to-decimal, which involves divide by 10 somewhere. If you're willing to settle for adding up the digits in base2, you can do this with an easy shift-right and add loop. Various clever tricks can be used to do this in chunks of N bits to speed it up further.

like image 45
Ira Baxter Avatar answered Oct 22 '22 23:10

Ira Baxter