Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

speeding up "base conversion" for large integers

I am using a base-conversion algorithm to generate a permutation from a large integer (split into 32-bit words).

I use a relatively standard algorithm for this:

/* N = count,K is permutation index (0..N!-1) A[N] contains 0..N-1 */
i = 0;
while (N > 1) {
   swap A[i] and A[i+(k%N)]
   k = k / N
   N = N - 1
   i = i + 1
}

Unfortunately, the divide and modulo each iteration adds up, especially moving to large integers - But, it seems I could just use multiply!

/* As before, N is count, K is index, A[N] contains 0..N-1 */
/* Split is arbitrarily 128 (bits), for my current choice of N */
/* "Adjust" is precalculated: (1 << Split)/(N!) */
a = k*Adjust; /* a can be treated as a fixed point fraction */
i = 0;
while (N > 1) {
   a = a*N;  
   index = a >> Split;         
   a = a & ((1 << Split) - 1);  /* actually, just zeroing a register */       
   swap A[i] and A[i+index]
   N = N - 1
   i = i + 1
}

This is nicer, but doing large integer multiplies is still sluggish.

Question 1:
Is there a way of doing this faster?

Eg. Since I know that N*(N-1) is less than 2^32, could I pull out those numbers from one word, and merge in the 'leftovers'?
Or, is there a way to modify an arithetic decoder to pull out the indicies one at a time?

Question 2:
For the sake of curiosity - if I use multiplication to convert a number to base 10 without the adjustment, then the result is multiplied by (10^digits/2^shift). Is there a tricky way to remove this factor working with the decimal digits? Even with the adjustment factor, this seems like it would be faster -- why wouldn't standard libraries use this vs divide and mod?

like image 626
Lucky Fruit Avatar asked Nov 25 '10 04:11

Lucky Fruit


1 Answers

Seeing that you are talking about numbers like 2^128/(N!), it seems that in your problem N is going to be rather small (N < 35 according to my calculations). I suggest taking the original algorithm as a starting point; first switch the direction of the loop:

i = 2;
while (i < N) {
    swap A[N - 1 - i] and A[N - i + k % i]
       k = k / i
       i = i + 1
}

Now change the loop to do several permutations per iteration. I guess the speed of division is the same regardless of the number i, as long as i < 2^32.
Split the range 2...N-1 into sub-ranges so that the product of the numbers in each sub-range is less than 2^32:

2, 3, 4, ..., 12: product is 479001600
13, 14, ..., 19:  product is 253955520
20, 21, ..., 26:  product is 3315312000
27, 28, ..., 32:  product is 652458240
33, 34, 35:       product is 39270

Then, divide the long number k by the products instead of dividing by i. Each iteration will yield a remainder (less than 2^32) and a smaller number k. When you have the remainder, you can work with it in an inner loop using the original algorithm; which will now be faster because it doesn't involve long division.
Here is some code:

static const int rangeCount = 5;
static const int rangeLimit[rangeCount] = {13, 20, 27, 33, 36};
static uint32_t rangeProduct[rangeCount] = {
    479001600,
    253955520,
    3315312000,
    652458240,
    39270
};

for (int rangeIndex = 0; rangeIndex < rangeCount; ++rangeIndex)
{
    // The following two lines involve long division;
    // math libraries probably calculate both quotient and remainder
    // in one function call
    uint32_t rangeRemainder = k % rangeProduct[rangeIndex];
    k /= rangeProduct[rangeIndex];

    // A range starts where the previous range ended
    int rangeStart = (rangeIndex == 0) ? 2 : rangeLimit[rangeIndex - 1];

    // Iterate over range
    for (int i = rangeStart; i < rangeLimit[rangeIndex] && i < n; ++i)
    {
        // The following two lines involve a 32-bit division;
        // it produces both quotient and remainder in one Pentium instruction
        int remainder = rangeRemainder % i;
        rangeRemainder /= i;
        std::swap(permutation[n - 1 - i], permutation[n - i + remainder]);
    }
}

Of course, this code can be extended into more than 128 bits.
Another optimization could involve extraction of powers of 2 from the products of ranges; this might add a slight speedup by making the ranges longer. Not sure whether this is worthwhile (maybe for large values of N, like N=1000).

like image 103
anatolyg Avatar answered Oct 22 '22 17:10

anatolyg