In this article, a fast recursive formulation of the Chudnovsky pi formula using binary splitting is given. In python:
C = 640320
C3_OVER_24 = C**3 // 24
def bs(a, b):
if b - a == 1:
if a == 0:
Pab = Qab = 1
else:
Pab = (6*a-5)*(2*a-1)*(6*a-1)
Qab = a*a*a*C3_OVER_24
Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
if a & 1:
Tab = -Tab
else:
m = (a + b) // 2
Pam, Qam, Tam = bs(a, m)
Pmb, Qmb, Tmb = bs(m, b)
Pab = Pam * Pmb
Qab = Qam * Qmb
Tab = Qmb * Tam + Pam * Tmb
return Pab, Qab, Tab
N = int(digits/DIGITS_PER_TERM + 1)
# Calclate P(0,N) and Q(0,N)
P, Q, T = bs(0, N)
one = 10**digits
sqrtC = sqrt(10005*one, one)
return (Q*426880*sqrtC) // T
This method is already very fast, but it's mentioned that an implementation on the GMP library website, gmp-chudnovsky.c, also factors the numerator and denominator in binary splitting. As the code is optimized and hard for me to understand, what is the general idea behind how this is done? I can't tell if fractions are simplified, numbers are kept in factorized form instead of being entirely multiplied, or both.
Here is a code sample of the binary splitting:
/* binary splitting */
void
bs(unsigned long a, unsigned long b, unsigned gflag, long int level)
{
unsigned long i, mid;
int ccc;
if (b-a==1) {
/*
g(b-1,b) = (6b-5)(2b-1)(6b-1)
p(b-1,b) = b^3 * C^3 / 24
q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
*/
mpz_set_ui(p1, b);
mpz_mul_ui(p1, p1, b);
mpz_mul_ui(p1, p1, b);
mpz_mul_ui(p1, p1, (C/24)*(C/24));
mpz_mul_ui(p1, p1, C*24);
mpz_set_ui(g1, 2*b-1);
mpz_mul_ui(g1, g1, 6*b-1);
mpz_mul_ui(g1, g1, 6*b-5);
mpz_set_ui(q1, b);
mpz_mul_ui(q1, q1, B);
mpz_add_ui(q1, q1, A);
mpz_mul (q1, q1, g1);
if (b%2)
mpz_neg(q1, q1);
i=b;
while ((i&1)==0) i>>=1;
fac_set_bp(fp1, i, 3); /* b^3 */
fac_mul_bp(fp1, 3*5*23*29, 3);
fp1[0].pow[0]--;
fac_set_bp(fg1, 2*b-1, 1); /* 2b-1 */
fac_mul_bp(fg1, 6*b-1, 1); /* 6b-1 */
fac_mul_bp(fg1, 6*b-5, 1); /* 6b-5 */
if (b>(int)(progress)) {
printf("."); fflush(stdout);
progress += percent*2;
}
} else {
/*
p(a,b) = p(a,m) * p(m,b)
g(a,b) = g(a,m) * g(m,b)
q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
*/
mid = a+(b-a)*0.5224; /* tuning parameter */
bs(a, mid, 1, level+1);
top++;
bs(mid, b, gflag, level+1);
top--;
if (level == 0)
puts ("");
ccc = level == 0;
if (ccc) CHECK_MEMUSAGE;
if (level>=4) { /* tuning parameter */
#if 0
long t = cputime();
#endif
fac_remove_gcd(p2, fp2, g1, fg1);
#if 0
gcd_time += cputime()-t;
#endif
}
if (ccc) CHECK_MEMUSAGE;
mpz_mul(p1, p1, p2);
if (ccc) CHECK_MEMUSAGE;
mpz_mul(q1, q1, p2);
if (ccc) CHECK_MEMUSAGE;
mpz_mul(q2, q2, g1);
if (ccc) CHECK_MEMUSAGE;
mpz_add(q1, q1, q2);
if (ccc) CHECK_MEMUSAGE;
fac_mul(fp1, fp2);
if (gflag) {
mpz_mul(g1, g1, g2);
fac_mul(fg1, fg2);
}
}
if (out&2) {
printf("p(%ld,%ld)=",a,b); fac_show(fp1);
if (gflag)
printf("g(%ld,%ld)=",a,b); fac_show(fg1);
}
}
The following comments are key:
/*
g(b-1,b) = (6b-5)(2b-1)(6b-1)
p(b-1,b) = b^3 * C^3 / 24
q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
*/
/*
p(a,b) = p(a,m) * p(m,b)
g(a,b) = g(a,m) * g(m,b)
q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
*/
/*
p*(C/D)*sqrt(C)
pi = -----------------
(q+A*p)
*/
Observe:
p
and q
by a factor of k
in the final division, it doesn't affect the result.p(a,m)
, g(a,m)
, and q(a,m)
by k
then that factor would simply be carried through to p(a,b)
, g(a,b)
, q(a,b)
; similarly if we were to scale each of p(m,b)
, g(m,b)
, and q(m,b)
then that factor would carry through.The line
fac_remove_gcd(p2, fp2, g1, fg1);
is effectively
k = gcd(p2, g1);
p2 /= k; // p(m,b) /= k
g1 /= k; // g(a,m) /= k
This has the net effect of downscaling p(a,b)
, g(a,b)
, and q(a,b)
by that gcd
. By the previous two observations, this downscaling carries through cleanly all the way to the final result.
I've tried three ways of implementing the factoring in Python.
gcd
does not give a speed-up6 * N
(i.e. which might divide g
) and testing each of those primes slows things down by a factor of 2 to 3.I conclude that getting a speed-up with this approach requires using mutable state to track the factorisations, so it's a big hit for maintainability.
I didn't look at the full code, but I had a quick glance at it in order to understand better the excerpt you provide in your question.
In order to answer to some points from your question, have a look at this piece of code first:
typedef struct {
unsigned long max_facs;
unsigned long num_facs;
unsigned long *fac;
unsigned long *pow;
} fac_t[1];
It defines a new type as a structure (if you don't know C at all, let's say it is like a rudimentary Pyhton object embedding variables but no methods). This type allows to handle integer numbers as two integer values and two arrays (say: two lists):
In the same time the code keeps the same numbers as big integers from the libgmp type (this is what is meant by mpz_t p
and mpz_t g
in the arguments of the function).
Now, what about the function you are showing. It is called fac_remove_gcd
; the initial fac
has to do with the name of the previously described type; the two following words are easy to understand: divide two integer numbers of type fac
by their gcd.
The C code iterates over the two lists of factors in both lists; it is easy to synchronize both lists since factors are ordered (section of the code around the else if
and else
statements); whenever two common factors are detected (initial if
statement), dividing is a matter of simple substraction: the smallest power is substracted in both lists of powers for this factor (for instance with a=2*5^3*17 and b=3*5^5*19, the value 3 will be substracted in the lists of powers for a and b at the position corresponding to factor 5 leading to a=2*5^0*17 and b=3*5^2*19).
During this operation a number (of the same fac
type) is created and called fmul
; it is obviously the gcd of both numbers.
After this step the gcd called fmul
and being of the fac
type is converted to a GMP big integer with the function (also in the code of the program) called bs_mul
. This allows to compute the GCD as a big integer in order to synchronize the new values of the divided numbers in both forms: big integers and special fac
type. Once the GCD is computed as a big integer, it is easy to divide both initial numbers by the GCD.
Thus the functions acts upon two versions of each initial numbers.
Hope it can help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With