I was surprised by  a timing difference that Daniel Lichtblau pointed out in two ways to get at the differences between the number of prime factors (PrimeOmega) and the number of distinct prime factors (PrimeNu) of an integer, n. So I decided to look into it a bit further.
The functions g1 and g2 below are slight variations of the ones Daniel used as well as three others. They all return the number of square-free integers from 1 through n.  But the differences are fairly dramatic.  Can anyone explain the reasons behind the differences. In particular, why does the Sum in g5 provide such a speed advantage?
g1[n_] := Count[PrimeOmega[Range[n]] - PrimeNu[Range[n]], 0]
g2[n_] := Count[With[{fax = FactorInteger[#]}, 
 Total[fax[[All, 2]]] - Length[fax]] & /@ Range[n], 0]
g3[n_] := Count[SquareFreeQ/@ Range[n], True]
(* g3[n_] := Count[SquareFreeQ[#] & /@ Range[n], True] Mr.Wizard's suggestion
   incorporated above. Better written but no significant increase in speed. *) 
g4[n_] := n - Count[MoebiusMu[Range[n]], 0]
g5[n_] := Sum[MoebiusMu[d]*Floor[(n - 1)/d^2], {d, 1, Sqrt[n - 1]}]  
The comparison:
n = 2^20;
Timing[g1[n]]
Timing[g2[n]]
Timing[g3[n]]
Timing[g4[n]]
Timing[g5[n]]
The results:
{44.5867, 637461}
{11.4228, 637461}
{4.43416, 637461}
{1.00392, 637461}
{0.004478, 637461}
Edit:
Mysticial raised the possibility that the latter routines were reaping the benefits of their order--that some caching may have been going on.
So let's run the comparison in the reverse order:
n = 2^20;
Timing[g5[n]]
Timing[g4[n]]
Timing[g3[n]]
Timing[g2[n]]
Timing[g1[n]]
Results of reversed comparisons:
{0.003755, 637461}
{0.978053, 637461}
{4.59551, 637461}
{11.2047, 637461}
{44.5979, 637461}
Verdict: A reasonable guess, but not supported by the data.
ModebiusMu for smallish numbers has some dedicated fast sieving code that in effect shortcuts actual factorization, just counting as it finds factors. It will not add exponents like PrimeOmega so it really has less of a task. Also it can short-circuit whenever it detects a prime factor with multiplicity.
I believe the cutoff is, coincidently, somewhere around 2^20. Not had time to test but it may indeed get slower a bit higher.
That answers for g4. Obviously g5 is using cleverness on the part of the programmer (nothing wrong with that of course), hence the speed gain.
As for g3, it is in essence a call to g4 in terms of implementation code. The bottleneck, in this case, is preprocessing overhead because it is implemented in mathematica rather than C code. I suspect this would be less visible if larger inputs were used, since in such cases more time would be spent doing the real work rather than in Mathematica interpreter overhead. Again, I have not tested this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With