Fast calculation of floating 1/N if factorization of very large integer N is known

Question

If I have integer number N that I know factorization of, what is the fastest (most efficient) way of computing 1/N as a floating point number? Big floating (or integer) arithmetics should be used for that.

I want to do this in C++ (or to do experimental runs in Python).

My N is very huge, Giga/Tera-bits in size. Resulting floating point N should also have huge precision, around the same bit-size as initial N.

Precise floating value is needed meaning if I request floating precision Log2(N) bits then at least all 95% leading bits of result should be precise (all same bits as in ideal value).

Of course instead of floating point computation one can compute 4^Ceil(Log2(N)) / N as integer division, if it helps and/or simplifies task. For me these two tasks (integer and float) are essentially same, because integer representation is convertible to floating and vice versa.

One important note is that factorization of N has only small prime factors, all of them are 32-bit in size (maybe 64-bit at most, for sure).

I wonder if having factorization of N and also the fact that factors are small, can it help somehow solving the task?

Of course instead of implementing my own division first I tried to use highly-optimized GMP library for this task, but it (as far as I know) doesn't use the fact that N is factorized already.

Can anyone suggest if I'm about to implement my own function for this, just to figure out experimentally if it will be faster than GMP's, then what kind of algorithm should I use?

I figured out that there are 3 algorithms that can be used here 1) Long Division, which is a school-grade algorithm. 2) Barrett Reduction. 3) Montgomery Reduction.

Actually I don't know any other algorithms. Can you suggest other? Both Barrett and Montgomery reductions can help only if same prime factor repeats many times, otherwise single-time division is not worth precomputation that is needed for Barrett and Montgomery.

Also Barrett/Montgomery reductions still need one-time computing 4^Ceil(Log2(N)) / PrimeDivisor. So they don't save you from doing Long Division algorithm.

Definitely for Long Division algorithm I will be using 2^64 as a base instead of base 10 (as in school).

I already implemented my own experimental library with Long Division and all other integer arithmetics algorithms plus elliptic curve arithmetics too. Right now it is generic and hence not faster than GMP. Now I need special division algo that can be at least several times faster than GMP.

Sure that in Long Division I can use Montgomery and Barrett, because at each step it needs short division (128 bit integer divided by 64 bit integer) if it give any boost.

Also at each step of Long Division I can use Fast Fourier Transform or Number Theoretic Transform for doing multiplication.

Above are the only optimizations that I know about. Are there any other optimizations possible? Maybe FFT can be used for directly doing division (not just for multiplication)?

Jérôme Richard · Accepted Answer

General Idea

The fact that N can be factorized in small integers can help. Indeed, this means N = a1 * a2 * ... * an. Thus, 1/N = 1/(a1 * a2 * ... * an) = (1/a1) * (1/a2) * ... * (1/an).

The thing is computing 1/a for many factor a can be faster than computing 1/N. Indeed:

the period of the decimal/binary expansion of 1 / a should be far smaller than 1 / N since log2(a) is much smaller than log2(N) meaning that the exact representation of 1/a can be stored in a very compact way and very quickly;
the reciprocal of the factors can be computed independently, so in parallel;
when a factor is used multiple times in the factorization of N, its costly reciprocal can be computed only once;
multiplying numbers should be much faster than computing a reciprocal or a division.
the final multiplication-based reduction can be done in parallel using a pair-wise reduction strategy.

However, there is a catch: this method requires a lot of multiplications since there is a lot of factors, and although they can be mostly computed in parallel, this is still a quite expensive computation (especially the last multiplication which is hard to execute in parallel). Thus, I am not sure the method is much faster than the very optimized method implemented in GMP, but it's certainly worth a try.

Notes

It is probably better to multiply numbers of the same size together or even to multiply first numbers with the smallest decimal/binary expansion periods together (in order to minimize the size of the symbolic representation of the computed reciprocal resulting in a smaller memory footprint and faster multiplications).

There are about 200 millions of prime numbers fitting in a 32-bit, but most factors of N should fit in 16-bits (since smaller factors are much more frequent) and there are only about 6500 prime numbers fitting in 16-bits, so their reciprocal could be precomputed if you plan to compute multiple reciprocals of different N.

You can use an exponentiation by squaring algorithm to efficiently compute the reciprocal of p^e when e is big. Such a term appear frequently in the exponential form of the prime decomposition of N when N is big.

You can truncate the period of the decimal/binary expansion if it is bigger than the final floating-point number. This will probably impact the accuracy of the method though. I think at least some additional digits are certainly needed to get an exact result (especially due to the many multiplications applied).

For the last multiplications involving huge symbolic representations, you can generate big floating-point numbers and provide them to GMP so it can use a very optimized implementation (typically based on Fast-Fourrier transforms).

Example

Here is an example with a small N = 307230 using a decimal representation for sake of clarity (the period of the decimal expansion is underlined):

1/307230 = 0.0000032548904729355857175406047586498714318263190443641571461120333300784428603977476157927285746834619015070142889691761872213
              ------------------------------------------------------------------------------------------------------------------------------

307230 = 2 * 3 * 5 * (7^2) * 11 * 19

1/2 = 0.50
         -
1/3 = 0.3
        -
1/5 = 0.20
         -
1/7 = 0.14285714
          ------
1/11 = 0.090
          --
1/19 = 0.0526315789473684210
          ------------------

k1 = ((1/2) * (1/3)) * ((1/5) * (1/11)) = 0.0030
                                          --

k2 = k1 * (1/19) = 0.000159489633173843700
                        ------------------

k3 = (1/7)^2 = 0.0204081632653061224489795918367346938775510
                  ------------------------------------------

1/N = k2 * k3

For the final multiplication, you can use an exact algorithm or a fast approximation with GMP.

Fast calculation of floating 1/N if factorization of very large integer N is known

Tags:

c++

performance

python

algorithm

math

Arty

1 Answers

General Idea

Notes

Example

Jérôme Richard

Recent Activity

Donate For Us

Fast calculation of floating 1/N if factorization of very large integer N is known

Tags:

c++

performance

python

algorithm

math

Arty

1 Answers

General Idea

Notes

Example

Jérôme Richard

Related questions

Recent Activity

Donate For Us