Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast exact bigint factorial

I have a fixed-point bignumber library and want to implement fast factorial with no precision loss.

After some math tricks on paper I got this formula:

(4N)!=((2N)!).((2N)!).{ (2N+1).(2N+3).(2N+5)...(4N-1) }.(2^N)/(N!)

This is already pretty fast, and with some programming tricks the complexity nears ~ O(log(n)).

To be clear, my current implementation is this:

//---------------------------------------------------------------------------
longnum fact(const DWORD &x,longnum &h) // h return (x>>1)! to speed up computation
    {
    if (x==0) { h=1; return  1; }
    if (x==1) { h=1; return  1; }
    if (x==2) { h=1; return  2; }
    if (x==3) { h=1; return  6; }
    if (x==4) { h=2; return 24; }
    int N4,N2,N,i; longnum c,q;
    N=(x>>2);
    N2=N<<1;
    N4=N<<2;
    h=fact(N2,q);                                          // get 2N! and N!
    c=h*h; for (i=(N2+1)|1;i<=N4;i+=2) c*=i; c/=q;         // c= ((2N!)^2)*T1 / N!
    for (i=N4+1;i<=x;i++) c*=i; c.round(); c<<=N  ;        // convert 4N! -> x!, cut off precision losses
    for (i=(N2+1)|1,N2=x>>1;i<=N2;i++) h*=i; h.round();    // convert 2N! -> (x/2)!, cut off precision losses
    return c;
    }
//---------------------------------------------------------------------------
longnum fact(const DWORD &x)
    {
    longnum tmp;
    return fact(x,tmp);
    }
//---------------------------------------------------------------------------

Now my question:

  1. Is there a fast way to obtain N! from this term: T1 = { (2N+1).(2N+3).(2N+5)...(4N-1) }?

    Already answered.

So to be clear, I need to extract this unknown term:

T2 = (4N)! / (((2N)!).((2N)!))

so:

(4N)! = (((2N)!).((2N)!)).T2

This would help a lot because then it would not be needed to compute .../(N!) for factorial.

The T1 term is always integer-decomposable to this:

T1 = T2 * N!

Finally, it hit me :) I have done a little program for primes decomposition of factorials and then suddenly all becomes much clearer:

4! =  2!.2!.(2^1).(3^1) = 24
8! =  4!.4!.(2^1).(5^1).(7^1) = 40320
12! =  6!.6!.(2^2).(3^1).(7^1).(11^1) = 479001600
16! =  8!.8!.(2^1).(3^2).(5^1).(11^1).(13^1) = 20922789888000
20! =  10!.10!.(2^2).(11^1).(13^1).(17^1).(19^1) = 2432902008176640000
24! =  12!.12!.(2^2).(7^1).(13^1).(17^1).(19^1).(23^1) = 620448401733239439360000
28! =  14!.14!.(2^3).(3^3).(5^2).(17^1).(19^1).(23^1) = 304888344611713860501504000000
32! =  16!.16!.(2^1).(3^2).(5^1).(17^1).(19^1).(23^1).(29^1).(31^1) = 263130836933693530167218012160000000
36! =  18!.18!.(2^2).(3^1).(5^2).(7^1).(11^1).(19^1).(23^1).(29^1).(31^1) = 371993326789901217467999448150835200000000
40! =  20!.20!.(2^2).(3^2).(5^1).(7^1).(11^1).(13^1).(23^1).(29^1).(31^1).(37^1) = 815915283247897734345611269596115894272000000000

After analyzing the prime exponents of the T2 term (the rest after half factorials ^ 2) I derive the formula for them:

T2(4N) = multiplication(i=2,3,5,7,11,13,17,...) of ( i ^ sum(j=1,2,3,4,5,...) of (4N/(i^j))-(2N/(i^j)) )
  • where multiplication is through all primes <= 4N
  • where sumation is until i^j <= 4N

The problem is that the divisions 4N/(i^j) and 2N/(i^j) must be done in integer math so they cannot be simplified easily.

So I have another question:

  1. How can I compute this: exponent(i) = sum(j=1,2,3,4,5,...) of (N/(i^j)) effectively?

    i is any prime where i<=N. It should be easy.

Now I calculate the exponent e for prime i inside the T2(N) term like this (but this is too complex for my taste):

for (e=0,a=N/i,b=(N>>1)/i;(a)||(b);e+=a-b-b,a/=i,b/=i);

... I will try implement T2 into fact(x) and compare speeds ...

like image 529
Spektre Avatar asked Aug 19 '13 15:08

Spektre


1 Answers

I have a solution:

(4N!)=((2N!)^2) . mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]

sub-terms of T2 are always prime^exponent where exponent can be computed on small integers like this:

for (e=0,j=N4;j;e+=j&1,j/=p);

where e is exponent, p is prime and N4 is 4*N

Code for the new equation:

// edit beg:
// Sorry, forget to copy sorted list of all primes up to max n here it is
// end of table is marked with 0
// Primes are in DWORDs so they only 4Byte per number
// so the table is very small compared with lookup table for the same max n!
// and also primes are needed for many other routines in bignum
// can compute n! for n <= max prime in table
DWORD _arithmetics_primes[]={2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,0};
// edit end.

longnum fact(const DWORD &x)
    {
    if (x<=4)
        {
        if (x==4) return 24;
        if (x==3) return  6;
        if (x==2) return  2;
        if (x==1) return  1;
        if (x==0) return  1;
        }
    int N4,N2,p,i,j,e; longnum c,pp;
    N4=(x>>2)<<2;
    N2=N4>>1;
    c=fact(N2); c*=c;                 // c=((2N)!)^2;
    for (i=0;;i++)                    // c*= T2
        {
        p=_arithmetics_primes[i];
        if (!p) break;
        if (p>N4) break;
        for (e=0,j=N4;j;e+=j&1,j/=p);
        if (e)                        // c*=p^e
            {
            if (p==2) c<<=e;
            else for (pp=p;;)
                {
                if (int(e&1)) c*=pp;
                e>>=1; if (!e) break;
                pp*=pp;
                }
            }
        }
    for (i=N4+1;i<=x;i++) { c*=i; } c.round();
    return c;
    }

Here are rough time measurements for the first 128 factorials so you can estimate real complexity.

Fixed point 768.128 bits arithmetics ... 231.36 decimals.

[ 0.001 ms ] 1! = 1
[ 0.000 ms ] 2! = 2
[ 0.000 ms ] 3! = 6
[ 0.000 ms ] 4! = 24
[ 0.006 ms ] 5! = 120
[ 0.006 ms ] 6! = 720
[ 0.007 ms ] 7! = 5040
[ 0.005 ms ] 8! = 40320
[ 0.006 ms ] 9! = 362880
[ 0.007 ms ] 10! = 3628800
[ 0.008 ms ] 11! = 39916800
[ 0.012 ms ] 12! = 479001600
[ 0.013 ms ] 13! = 6227020800
[ 0.014 ms ] 14! = 87178291200
[ 0.016 ms ] 15! = 1307674368000
[ 0.014 ms ] 16! = 20922789888000
[ 0.015 ms ] 17! = 355687428096000
[ 0.017 ms ] 18! = 6402373705728000
[ 0.019 ms ] 19! = 121645100408832000
[ 0.016 ms ] 20! = 2432902008176640000
[ 0.017 ms ] 21! = 51090942171709440000
[ 0.019 ms ] 22! = 1124000727777607680000
[ 0.021 ms ] 23! = 25852016738884976640000
[ 0.023 ms ] 24! = 620448401733239439360000
[ 0.025 ms ] 25! = 15511210043330985984000000
[ 0.027 ms ] 26! = 403291461126605635584000000
[ 0.029 ms ] 27! = 10888869450418352160768000000
[ 0.032 ms ] 28! = 304888344611713860501504000000
[ 0.034 ms ] 29! = 8841761993739701954543616000000
[ 0.037 ms ] 30! = 265252859812191058636308480000000
[ 0.039 ms ] 31! = 8222838654177922817725562880000000
[ 0.034 ms ] 32! = 263130836933693530167218012160000000
[ 0.037 ms ] 33! = 8683317618811886495518194401280000000
[ 0.039 ms ] 34! = 295232799039604140847618609643520000000
[ 0.041 ms ] 35! = 10333147966386144929666651337523200000000
[ 0.039 ms ] 36! = 371993326789901217467999448150835200000000
[ 0.041 ms ] 37! = 13763753091226345046315979581580902400000000
[ 0.044 ms ] 38! = 523022617466601111760007224100074291200000000
[ 0.046 ms ] 39! = 20397882081197443358640281739902897356800000000
[ 0.041 ms ] 40! = 815915283247897734345611269596115894272000000000
[ 0.044 ms ] 41! = 33452526613163807108170062053440751665152000000000
[ 0.046 ms ] 42! = 1405006117752879898543142606244511569936384000000000
[ 0.049 ms ] 43! = 60415263063373835637355132068513997507264512000000000
[ 0.048 ms ] 44! = 2658271574788448768043625811014615890319638528000000000
[ 0.050 ms ] 45! = 119622220865480194561963161495657715064383733760000000000
[ 0.054 ms ] 46! = 5502622159812088949850305428800254892961651752960000000000
[ 0.056 ms ] 47! = 258623241511168180642964355153611979969197632389120000000000
[ 0.056 ms ] 48! = 12413915592536072670862289047373375038521486354677760000000000
[ 0.060 ms ] 49! = 608281864034267560872252163321295376887552831379210240000000000
[ 0.063 ms ] 50! = 30414093201713378043612608166064768844377641568960512000000000000
[ 0.066 ms ] 51! = 1551118753287382280224243016469303211063259720016986112000000000000
[ 0.065 ms ] 52! = 80658175170943878571660636856403766975289505440883277824000000000000
[ 0.069 ms ] 53! = 4274883284060025564298013753389399649690343788366813724672000000000000
[ 0.072 ms ] 54! = 230843697339241380472092742683027581083278564571807941132288000000000000
[ 0.076 ms ] 55! = 12696403353658275925965100847566516959580321051449436762275840000000000000
[ 0.077 ms ] 56! = 710998587804863451854045647463724949736497978881168458687447040000000000000
[ 0.162 ms ] 57! = 40526919504877216755680601905432322134980384796226602145184481280000000000000
[ 0.095 ms ] 58! = 2350561331282878571829474910515074683828862318181142924420699914240000000000000
[ 0.093 ms ] 59! = 138683118545689835737939019720389406345902876772687432540821294940160000000000000
[ 0.089 ms ] 60! = 8320987112741390144276341183223364380754172606361245952449277696409600000000000000
[ 0.093 ms ] 61! = 507580213877224798800856812176625227226004528988036003099405939480985600000000000000
[ 0.098 ms ] 62! = 31469973260387937525653122354950764088012280797258232192163168247821107200000000000000
[ 0.096 ms ] 63! = 1982608315404440064116146708361898137544773690227268628106279599612729753600000000000000
[ 0.090 ms ] 64! = 126886932185884164103433389335161480802865516174545192198801894375214704230400000000000000
[ 0.100 ms ] 65! = 8247650592082470666723170306785496252186258551345437492922123134388955774976000000000000000
[ 0.104 ms ] 66! = 544344939077443064003729240247842752644293064388798874532860126869671081148416000000000000000
[ 0.111 ms ] 67! = 36471110918188685288249859096605464427167635314049524593701628500267962436943872000000000000000
[ 0.100 ms ] 68! = 2480035542436830599600990418569171581047399201355367672371710738018221445712183296000000000000000
[ 0.121 ms ] 69! = 171122452428141311372468338881272839092270544893520369393648040923257279754140647424000000000000000
[ 0.109 ms ] 70! = 11978571669969891796072783721689098736458938142546425857555362864628009582789845319680000000000000000
[ 0.119 ms ] 71! = 850478588567862317521167644239926010288584608120796235886430763388588680378079017697280000000000000000
[ 0.104 ms ] 72! = 61234458376886086861524070385274672740778091784697328983823014963978384987221689274204160000000000000000
[ 0.124 ms ] 73! = 4470115461512684340891257138125051110076800700282905015819080092370422104067183317016903680000000000000000
[ 0.113 ms ] 74! = 330788544151938641225953028221253782145683251820934971170611926835411235700971565459250872320000000000000000
[ 0.118 ms ] 75! = 24809140811395398091946477116594033660926243886570122837795894512655842677572867409443815424000000000000000000
[ 0.118 ms ] 76! = 1885494701666050254987932260861146558230394535379329335672487982961844043495537923117729972224000000000000000000
[ 0.123 ms ] 77! = 145183092028285869634070784086308284983740379224208358846781574688061991349156420080065207861248000000000000000000
[ 0.129 ms ] 78! = 11324281178206297831457521158732046228731749579488251990048962825668835325234200766245086213177344000000000000000000
[ 0.133 ms ] 79! = 894618213078297528685144171539831652069808216779571907213868063227837990693501860533361810841010176000000000000000000
[ 0.121 ms ] 80! = 71569457046263802294811533723186532165584657342365752577109445058227039255480148842668944867280814080000000000000000000
[ 0.119 ms ] 81! = 5797126020747367985879734231578109105412357244731625958745865049716390179693892056256184534249745940480000000000000000000
[ 0.131 ms ] 82! = 475364333701284174842138206989404946643813294067993328617160934076743994734899148613007131808479167119360000000000000000000
[ 0.150 ms ] 83! = 39455239697206586511897471180120610571436503407643446275224357528369751562996629334879591940103770870906880000000000000000000
[ 0.141 ms ] 84! = 3314240134565353266999387579130131288000666286242049487118846032383059131291716864129885722968716753156177920000000000000000000
[ 0.148 ms ] 85! = 281710411438055027694947944226061159480056634330574206405101912752560026159795933451040286452340924018275123200000000000000000000
[ 0.154 ms ] 86! = 24227095383672732381765523203441259715284870552429381750838764496720162249742450276789464634901319465571660595200000000000000000000
[ 0.163 ms ] 87! = 2107757298379527717213600518699389595229783738061356212322972511214654115727593174080683423236414793504734471782400000000000000000000
[ 0.211 ms ] 88! = 185482642257398439114796845645546284380220968949399346684421580986889562184028199319100141244804501828416633516851200000000000000000000
[ 0.151 ms ] 89! = 16507955160908461081216919262453619309839666236496541854913520707833171034378509739399912570787600662729080382999756800000000000000000000
[ 0.157 ms ] 90! = 1485715964481761497309522733620825737885569961284688766942216863704985393094065876545992131370884059645617234469978112000000000000000000000
[ 0.166 ms ] 91! = 135200152767840296255166568759495142147586866476906677791741734597153670771559994765685283954750449427751168336768008192000000000000000000000
[ 0.161 ms ] 92! = 12438414054641307255475324325873553077577991715875414356840239582938137710983519518443046123837041347353107486982656753664000000000000000000000
[ 0.169 ms ] 93! = 1156772507081641574759205162306240436214753229576413535186142281213246807121467315215203289516844845303838996289387078090752000000000000000000000
[ 0.173 ms ] 94! = 108736615665674308027365285256786601004186803580182872307497374434045199869417927630229109214583415458560865651202385340530688000000000000000000000
[ 0.188 ms ] 95! = 10329978488239059262599702099394727095397746340117372869212250571234293987594703124871765375385424468563282236864226607350415360000000000000000000000
[ 0.181 ms ] 96! = 991677934870949689209571401541893801158183648651267795444376054838492222809091499987689476037000748982075094738965754305639874560000000000000000000000
[ 0.187 ms ] 97! = 96192759682482119853328425949563698712343813919172976158104477319333745612481875498805879175589072651261284189679678167647067832320000000000000000000000
[ 0.194 ms ] 98! = 9426890448883247745626185743057242473809693764078951663494238777294707070023223798882976159207729119823605850588608460429412647567360000000000000000000000
[ 0.201 ms ] 99! = 933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109168640000000000000000000000
[ 0.185 ms ] 100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[ 0.191 ms ] 101! = 9425947759838359420851623124482936749562312794702543768327889353416977599316221476503087861591808346911623490003549599583369706302603264000000000000000000000000
[ 0.202 ms ] 102! = 961446671503512660926865558697259548455355905059659464369444714048531715130254590603314961882364451384985595980362059157503710042865532928000000000000000000000000
[ 0.207 ms ] 103! = 99029007164861804075467152545817733490901658221144924830052805546998766658416222832141441073883538492653516385977292093222882134415149891584000000000000000000000000
[ 0.242 ms ] 104! = 10299016745145627623848583864765044283053772454999072182325491776887871732475287174542709871683888003235965704141638377695179741979175588724736000000000000000000000000
[ 0.210 ms ] 105! = 1081396758240290900504101305800329649720646107774902579144176636573226531909905153326984536526808240339776398934872029657993872907813436816097280000000000000000000000000
[ 0.215 ms ] 106! = 114628056373470835453434738414834942870388487424139673389282723476762012382449946252660360871841673476016298287096435143747350528228224302506311680000000000000000000000000
[ 0.221 ms ] 107! = 12265202031961379393517517010387338887131568154382945052653251412013535324922144249034658613287059061933743916719318560380966506520420000368175349760000000000000000000000000
[ 0.217 ms ] 108! = 1324641819451828974499891837121832599810209360673358065686551152497461815091591578895743130235002378688844343005686404521144382704205360039762937774080000000000000000000000000
[ 0.226 ms ] 109! = 144385958320249358220488210246279753379312820313396029159834075622223337844983482099636001195615259277084033387619818092804737714758384244334160217374720000000000000000000000000
[ 0.232 ms ] 110! = 15882455415227429404253703127090772871724410234473563207581748318444567162948183030959960131517678520479243672638179990208521148623422266876757623911219200000000000000000000000000
[ 0.240 ms ] 111! = 1762952551090244663872161047107075788761409536026565516041574063347346955087248316436555574598462315773196047662837978913145847497199871623320096254145331200000000000000000000000000
[ 0.213 ms ] 112! = 197450685722107402353682037275992488341277868034975337796656295094902858969771811440894224355027779366597957338237853638272334919686385621811850780464277094400000000000000000000000000
[ 0.231 ms ] 113! = 22311927486598136465966070212187151182564399087952213171022161345724023063584214692821047352118139068425569179220877461124773845924561575264739138192463311667200000000000000000000000000
[ 0.240 ms ] 114! = 2543559733472187557120132004189335234812341496026552301496526393412538629248600474981599398141467853800514886431180030568224218435400019580180261753940817530060800000000000000000000000000
[ 0.252 ms ] 115! = 292509369349301569068815180481773552003419272043053514672100535242441942363589054622883930786268803187059211939585703515345785120071002251720730101703194015956992000000000000000000000000000
[ 0.248 ms ] 116! = 33931086844518982011982560935885732032396635556994207701963662088123265314176330336254535971207181169698868584991941607780111073928236261199604691797570505851011072000000000000000000000000000
[ 0.598 ms ] 117! = 3969937160808720895401959629498630647790406360168322301129748464310422041758630649341780708631240196854767624444057168110272995649603642560353748940315749184568295424000000000000000000000000000
[ 0.259 ms ] 118! = 468452584975429065657431236280838416439267950499862031533310318788629800927518416622330123618486343228862579684398745837012213486653229822121742374957258403779058860032000000000000000000000000000
[ 0.261 ms ] 119! = 55745857612076058813234317117419771556272886109483581752463927935846946310374691578057284710599874844234646982443450754604453404911734348832487342619913750049708004343808000000000000000000000000000
[ 0.254 ms ] 120! = 6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000
[ 0.263 ms ] 121! = 809429852527344373968162284544935082997082306309701607045776233628497660426640521713391773997910182738287074185078904956856663439318382745047716214841147650721760223072092160000000000000000000000000000
[ 0.270 ms ] 122! = 98750442008336013624115798714482080125644041369783596059584700502676714572050143649033796427745042294071023050579626404736512939596842694895821378210620013388054747214795243520000000000000000000000000000
[ 0.281 ms ] 123! = 12146304367025329675766243241881295855454217088483382315328918161829235892362167668831156960612640202170735835221294047782591091570411651472186029519906261646730733907419814952960000000000000000000000000000
[ 0.290 ms ] 124! = 1506141741511140879795014161993280686076322918971939407100785852066825250652908790935063463115967385069171243567440461925041295354731044782551067660468376444194611004520057054167040000000000000000000000000000
[ 0.322 ms ] 125! = 188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000
[ 0.303 ms ] 126! = 23721732428800468856771473051394170805702085973808045661837377170052497697783313457227249544076486314839447086187187275319400401837013955325179315652376928996065123321190898603130880000000000000000000000000000000
[ 0.313 ms ] 127! = 3012660018457659544809977077527059692324164918673621799053346900596667207618480809067860692097713761984609779945772783965563851033300772326297773087851869982500270661791244122597621760000000000000000000000000000000
[ 0.307 ms ] 128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000
refernce     128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000

My measurements reveal that N! uses

  • max of 2.2N fast low level long operations (+,-,<<,>>)
  • slightly less than N/2 long multiplications, but most of them are convenient in size which speeds up the multiplication, so the measured times do not match the obvious O(N/2*O(multiplication)).
  • After using analysis on the times the observed complexity best matches O(N^1.4) using Karatsuba as results are still way below NTT based multiplication, after that the complexity should be even better.

Also I have tried factorial as non recursive multiplication of primes only (similar to T2 term), but the results was much slower.

P.S.: Code posted in the question is also 100% working, but slower than new one (even if it uses fewer multiplications - because of more memory needed for recursion and not optimized multiplicants order).

like image 149
Spektre Avatar answered Oct 02 '22 13:10

Spektre