Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to write "mod 31" without modulus/division operators?

Getting the modulus of a number can be easily done without the modulus operator or divisions, if your operand is a power of 2. In that case, the following formula holds: x % y = (x & (y − 1)). This is often many performant in many architectures. Can the same be done for mod 31?

int mod31(int a){ return a % 31; };
like image 270
MaiaVictor Avatar asked Sep 25 '14 20:09

MaiaVictor


People also ask

How do you calculate modulus without operator?

This is the basic formula: dividend = divisor * quotient + remainder From this equation you can calculate the remainder.

Which operator is used for modulus division?

The modulo operator, denoted by %, is an arithmetic operator. The modulo division operator produces the remainder of an integer division.

How do you write a modulus operator?

The modulo operation (abbreviated “mod”, or “%” in many programming languages) is the remainder when dividing. For example, “5 mod 3 = 2” which means 2 is the remainder when you divide 5 by 3.

Is modulus same as division?

In integer division and modulus, the dividend is divided by the divisor into an integer quotient and a remainder. The integer quotient operation is referred to as integer division, and the integer remainder operation is the modulus.


1 Answers

Here are two ways to approach this problem. The first one using a common bit-twiddling technique, and if carefully optimized can beat hardware division. The other one substitutes a multiply for the divide, similar to the optimization performed by gcc, and is far and away the fastest. The bottom line is that there's not much point trying to avoid the % operator if the second argument is constant, because gcc's got it covered. (And probably other compilers, too.)

The following function is based on the fact that x is the same (mod 31) as the sum of the base-32 digits of x. That's true because 32 is 1 mod 31, and consequently any power of 32 is 1 mod 31. So each "digit" position in a base-32 number contributes the digit * 1 to the mod 31 sum. And it's easy to get the base-32 representation: we just take the bits five at a time.

(Like the rest of the functions in this answer, it will only work for non-negative x).

unsigned mod31(unsigned x) {
  unsigned tmp;
  for (tmp = 0; x; x >>= 5) {
    tmp += x & 31;
  }
  // Here we assume that there are at most 160 bits in x
  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

For a specific integer size, you could unroll the loop and quite possibly beat division. (And see @chux's answer for a way to convert the loop into O(log bits) operations instead of O(bits) It's more difficult to beat gcc, which avoids division when the dividend is a constant known at compile-time.

In a very quick benchmark using unsigned 32 bit integers, the naive unrolled loop took 19 seconds and a version based on @chux's answer took only 13 seconds, but gcc's x%31 took 9.7 seconds. Forcing gcc to use a hardware divide (by making the division non-constant) took 23.4 seconds, and the code as shown above took 25.6 seconds. Those figures should be taken with several grains of salt. The times are for computing i%31 for all possible values of i, on my laptop using -O3 -march=native.

gcc avoids 32-bit division by a constant by replacing it with what is essentially a 64-bit multiplication by the inverse of the constant followed by a right shift. (The actual algorithm does a bit more work to avoid overflows.) The procedure was implemented more than 20 years ago in gcc v2.6, and the paper which describes the algorithm is available on the gmp site. (GMP also uses this trick.)

Here's a simplified version: Say we want to compute n // 31 for some unsigned 32-bit integer n (using the pythonic // to indicate truncated integer division). We use the "magic constant" m = 232 // 31, which is 138547332. Now it's clear that for any n:

m * n <= 232 * n/31 < m * n + n ⇒ m * n // 232 <= n//31 <= (m * n + n) // 232

(Here we make use of the fact that if a < b then floor(a) <= floor(b).)

Furthermore, since n < 232, m * n // 232 and (m * n + n) // 232 are either the same integer or two consecutive integers. Consequently, one (or both) of those two is the actual value of n//31.

Now, we really want to compute n%31. So we need to multiply the (presumed) quotient by 31, and subtract that from n. If we use the smaller of the two possible quotients, it may turn out that the computed modulo value is too big, but it can only be too big by 31.

Or, to put it in code:

static unsigned long long magic = 138547332;
unsigned mod31g(unsigned x) {
  unsigned q = (x * magic) >> 32;
  // To multiply by 31, we multiply by 32 and subtract
  unsigned mod = x - ((q << 5) - q);
  return mod < 31 ? mod : mod - 31;
}

The actual algorithm used by gcc avoids the test at the end by using a slightly more accurate computation based on multiplying by 237//31 + 1. That always produces the correct quotient, but at the cost of some extra shifts and adds to avoid integer overflow. As it turns out, the version above is slightly faster -- in the same benchmark as above, it took only 6.3 seconds.


Other benchmarked functions, for completeness:

Naive unrolled loop

unsigned mod31b(unsigned x) {
  unsigned tmp = x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31;

  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

@chux's improvement, slightly optimized

static const unsigned mask1 = (31U << 0) | (31U << 10) | (31U << 20) | (31U << 30);
static const unsigned mask2 = (31U << 5) | (31U << 15) | (31U << 25);
unsigned mod31c(unsigned x) {
  x = (x & mask1) + ((x & mask2) >> 5);
  x += x >> 20;
  x += x >> 10;

  x = (x & 31) + ((x >> 5) & 31);
  return x >= 31 ? x - 31: x;
}
like image 190
rici Avatar answered Nov 09 '22 04:11

rici