Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Newton-Raphson Division With Big Integers

I'm making a BigInt class as a programming exercise. It uses a vector of 2's complement signed ints in base-65536 (so that 32-bit multiplications don't overflow. I will increase the base once I get it fully working).

All of the basic math operations are coded, with one problem: division is painfully slow with the basic algorithm I was able to create. (It kind of works like binary division for each digit of the quotient... I'm not going to post it unless someone wants to see it....)

Instead of my slow algorithm, I want to use Newton-Raphson to find the (shifted) reciprocal and then multiply (and shift). I think I have my head around the basics: you give the formula (x1 = x0(2 - x0 * divisor)) a good initial guess, and then after some amount of iterations, x converges to the reciprocal. This part seems easy enough... but I am running into some problems when trying to apply this formula to big integers:

Problem 1:

Because I am working with integers... well... I can't use fractions. This seems to cause x to always diverge (x0 * divisor must be <2 it seems?). My intuition tells me there should be some modification to the equation that would allow it to work integers (to some accuracy) but I am really struggling to find out what it is. (My lack of math skills are beating me up here....) I think I need to find some equivalent equation where instead of d there is d*[base^somePower]? Can there be some equation like (x1 = x0(2 - x0 * d)) that works with whole numbers?

Problem 2:

When I use Newton's formula to find the reciprocal of some numbers, the result ends up being just a small faction below what the answer should be... ex. when trying to find reciprocal of 4 (in decimal):

x0 = 0.3
x1 = 0.24
x2 = 0.2496
x3 = 0.24999936
x4 = 0.2499999999983616
x5 = 0.24999999999999999999998926258176

If I were representing numbers in base-10, I would want a result of 25 (and to remember to right shift product by 2). With some reciprocals such as 1/3, you can simply truncate the result after you know you have enough accuracy. But how can I pull out the correct reciprocal from the above result?

Sorry if this is all too vague or if I'm asking for too much. I looked through Wikipedia and all of the research papers I could find on Google, but I feel like I'm banging my head against a wall. I appreciate any help anyone can give me!

...

Edit: Got the algorithm working, although it is much slower than I expected. I actually lost a lot of speed compared to my old algorithm, even on numbers with thousands of digits... I'm still missing something. It's not a problem with multiplication, which is very fast. (I am indeed using Karatsuba's algorithm).

For anyone interested, here is my current iteration of the Newton-Raphson algorithm:

bigint operator/(const bigint& lhs, const bigint& rhs) {
    if (rhs == 0) throw overflow_error("Divide by zero exception");
    bigint dividend = lhs;
    bigint divisor = rhs;

    bool negative = 0;
    if (dividend < 0) {
        negative = !negative;
        dividend.invert();
    }
    if (divisor < 0) {
        negative = !negative;
        divisor.invert();
    }

    int k = dividend.numBits() + divisor.numBits();
    bigint pow2 = 1;
    pow2 <<= k + 1;

    bigint x = dividend - divisor;
    bigint lastx = 0;
    bigint lastlastx = 0;
    while (1) {
        x = (x * (pow2 - x * divisor)) >> k;
        if (x == lastx || x == lastlastx) break;
        lastlastx = lastx;
        lastx = x;
    }
    bigint quotient = dividend * x >> k;
    if (dividend - (quotient * divisor) >= divisor) quotient++;
    if (negative)quotient.invert();
    return quotient;
}

And here is my (really ugly) old algorithm that is faster:

bigint operator/(const bigint& lhs, const bigint & rhs) {
    if (rhs == 0) throw overflow_error("Divide by zero exception");
    bigint dividend = lhs;
    bigint divisor = rhs;

    bool negative = 0;
    if (dividend < 0) {
        negative = !negative;
        dividend.invert();
    }
    if (divisor < 0) {
        negative = !negative;
        divisor.invert();
    }

    bigint remainder = 0;
    bigint quotient = 0;
    while (dividend.value.size() > 0) {
        remainder.value.insert(remainder.value.begin(), dividend.value.at(dividend.value.size() - 1));
        remainder.value.push_back(0);
        remainder.unPad();
        dividend.value.pop_back();

        if (divisor > remainder) {
            quotient.value.push_back(0);
        } else {
            int count = 0;
            int i = MSB;
            bigint value = 0;
            while (i > 0) {
                bigint increase = divisor * i;
                bigint next = value + increase;
                if (next <= remainder) {
                    value = next;
                    count += i;
                }
                i >>= 1;
            }
            quotient.value.push_back(count);
            remainder -= value;
        }
    }

    for (int i = 0; i < quotient.value.size() / 2; i++) {
        int swap = quotient.value.at(i);
        quotient.value.at(i) = quotient.value.at((quotient.value.size() - 1) - i);
        quotient.value.at(quotient.value.size() - 1 - i) = swap;
    }

    if (negative)quotient.invert();
    quotient.unPad();
    return quotient;
}
like image 395
user3044553 Avatar asked Jan 06 '15 15:01

user3044553


2 Answers

First of all, you can implement division in time O(n^2) and with reasonable constant, so it's not (much) slower than the naive multiplication. However, if you use Karatsuba-like algorithm, or even FFT-based multiplication algorithm, then you indeed can speedup your division algorithm using Newton-Raphson.

A Newton-Raphson iteration for calculating the reciprocal of x is q[n+1]=q[n]*(2-q[n]*x).

Suppose we want to calculate floor(2^k/B) where B is a positive integer. WLOG, B≤2^k; otherwise, the quotient is 0. The Newton-Raphson iteration for x=B/2^k yields q[n+1]=q[n]*(2-q[n]*B/2^k). we can rearrange it as

q[n+1]=q[n]*(2^(k+1)-q[n]*B) >> k

Each iteration of this kind requires only integer multiplications and bit shifts. Does it converge to floor(2^k/B)? Not necessarily. However, in the worst case, it eventually alternates between floor(2^k/B) and ceiling(2^k/B) (Prove it!). So you can use some not-so-clever test to see if you are in this case, and extract floor(2^k/B). (this "not-so-clever test" should be alot faster than the multiplications in each iteration; However, it will be nice to optimize this thing).

Indeed, calculating floor(2^k/B) suffices in order to calculate floor(A/B) for any positive integers A,B. Take k such that A*B≤2^k, and verify floor(A/B)=A*ceiling(2^k/B) >> k.

Lastly, a simple but important optimization for this approach is to truncate multiplications (i.e. calculate only the higher bits of the product) in the early iterations of the Newton-Raphson method. The reason to do so, is that the results of the early iterations are far from the quotient, and it doesn't matter to perform them inaccurately. (Refine this argument and show that if you do this thing appropriately, you can divide two ≤n-bit integers in time O(M(2n)), assuming you can multiply two ≤k-bit integers in time M(k), and M(x) is an increasing convex function).

like image 109
ohad Avatar answered Sep 20 '22 02:09

ohad


If I see this correctly a major improvement is picking a good starting value for x. Knowing how many digits the divisor has you know where the most significant bit of the inverse has to be, as

1/x = pow(2,log2(1/x))
1/x = pow(2,-log2(x))
1/x >= pow(2,-floor(log2(x)))

floor(log2(x)) simply is the index of the most significant bit set.

As suggested in the comment by the op using a 256-bit lookup table is a going to speed up the convergence even more, because each step roughly doubles the amount of correct digits. Starting with 8 correct digits is better than starting with 1 and much better than starting with even less than that.

  constexpr fixpoint_integer_inverse(const T& d) {
    uint8_t lut[256] = { 255u,254u,253u,252u,251u,250u,249u,248u,247u,246u,245u,244u,243u,242u,241u,
240u,240u,239u,238u,237u,236u,235u,234u,234u,233u,232u,231u,230u,229u,229u,228u,
227u,226u,225u,225u,224u,223u,222u,222u,221u,220u,219u,219u,218u,217u,217u,216u,
215u,214u,214u,213u,212u,212u,211u,210u,210u,209u,208u,208u,207u,206u,206u,205u,
204u,204u,203u,202u,202u,201u,201u,200u,199u,199u,198u,197u,197u,196u,196u,195u,
195u,194u,193u,193u,192u,192u,191u,191u,190u,189u,189u,188u,188u,187u,187u,186u,
186u,185u,185u,184u,184u,183u,183u,182u,182u,181u,181u,180u,180u,179u,179u,178u,
178u,177u,177u,176u,176u,175u,175u,174u,174u,173u,173u,172u,172u,172u,171u,171u,
170u,170u,169u,169u,168u,168u,168u,167u,167u,166u,166u,165u,165u,165u,164u,164u,
163u,163u,163u,162u,162u,161u,161u,161u,160u,160u,159u,159u,159u,158u,158u,157u,
157u,157u,156u,156u,156u,155u,155u,154u,154u,154u,153u,153u,153u,152u,152u,152u,
151u,151u,151u,150u,150u,149u,149u,149u,148u,148u,148u,147u,147u,147u,146u,146u,
146u,145u,145u,145u,144u,144u,144u,144u,143u,143u,143u,142u,142u,142u,141u,141u,
141u,140u,140u,140u,140u,139u,139u,139u,138u,138u,138u,137u,137u,137u,137u,136u,
136u,136u,135u,135u,135u,135u,134u,134u,134u,134u,133u,133u,133u,132u,132u,132u,
132u,131u,131u,131u,131u,130u,130u,130u,130u,129u,129u,129u,129u,128u,128u,128u,
127u
    };
    const auto l = log2(d);
    T x;
    if (l<8) {
      x = T(1)<<(digits(d)-1-l);
    } else {
      if (digits(d)>(l+8)) x = T(lut[(d>>(l-8))-256])<<(digits(d)-l-8);
      else x = T(lut[(d>>(l-8))-256])>>(l+8-digits(d));
    }
    if (x==0) x=1;
    while(true) {
      const auto lm = long_mul(x,T(1)-x*d);
      const T i = get<0>(lm);
      if (i) x+=i;
      else return x;
    }
    return x;
  }

  // calculate a * b = r0r1
  template<typename T>
  typename std::enable_if<std::is_unsigned<T>::value,tuple<T,T>>::type
  constexpr long_mul(const T& a, const T& b){
    const T N  = digits<T>()/2;
    const T t0 = (a>>N)*(b>>N);
    const T t1 = ((a<<N)>>N)*(b>>N);
    const T t2 = (a>>N)*((b<<N)>>N);
    const T t3 = ((a<<N)>>N)*((b<<N)>>N);
    const T t4 = t3+(t1<<N);
    const T r1 = t4+(t2<<N);
    const T r0 = (r1<t4)+(t4<t3)+(t1>>N)+(t2>>N)+t0;
    return {r0,r1};
  }
like image 30
Wolfgang Brehm Avatar answered Sep 21 '22 02:09

Wolfgang Brehm