Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The Sieve of Atkin

Tags:

I have been trying to learn algorithms for generating prime numbers and I came across Sieve of Atkin on Wikipedia. I understand almost all parts of the algorithm, except a few. Here are the questions:

  1. How are the three quadratic equations below formed? 4x^2+y^2, 3x^2+y^2 and 3x^2-y2
  2. The algorithm in wikipedia talks about modulo sixty but I dont understand how/where that is used in the psudocode below.
  3. How are these reminders 1,5,7 and 11 found?

Below is the pseudocode from Wikipedia for reference:

// arbitrary search limit                                                   
limit ← 1000000                                                             

// initialize the sieve                                                     
for i in [5, limit]: is_prime(i) ← false                                    

// put in candidate primes:                                                 
// integers which have an odd number of                                     
// representations by certain quadratic forms                               
for (x, y) in [1, √limit] × [1, √limit]:                                    
    n ← 4x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 1 or n mod 12 = 5):                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 7):                                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²-y²                                                              
    if (x > y) and (n ≤ limit) and (n mod 12 = 11):                         
        is_prime(n) ← ¬is_prime(n)                                          

// eliminate composites by sieving                                          
for n in [5, √limit]:                                                       
    if is_prime(n):                                                         
        // n is prime, omit multiples of its square; this is                
        // sufficient because composites which managed to get               
        // on the list cannot be square-free                                
        is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit}                 

print 2, 3                                                                  
for n in [5, limit]:                                                        
    if is_prime(n): print n  
like image 645
Shiv Avatar asked Oct 15 '13 18:10

Shiv


People also ask

How does the sieve of Atkin work?

The Sieve of Atkin is an efficient algorithm used to find all prime numbers upto a given number (say N) and does so in O(N) time complexity. With a modified version with enumerating lattice points variation, the time complexity goes to O(N / log log N) .

What is the meaning of Sieve of Eratosthenes?

: a procedure for finding prime numbers that involves writing down the odd numbers from 2 up in succession and crossing out every third number after 3, every fifth after 5 including those already crossed out, every seventh after 7, and so on with the numbers that are never crossed out being prime.

Why was Eratosthenes sieve so important?

Eratosthenes made many important contributions to science and mathematics. His prime number sieve provided a simple way for Greek mathematicians (and frustrated modern students!) to find all prime numbers between any two integers.

Is prime Sieve of Eratosthenes?

The sieve of Eratosthenes algorithm is an ancient algorithm that is used to find all the prime numbers less than given number T. It can be done using O(n*log(log(n))) operations. Using this algorithm we can eliminate all the numbers which are not prime and those that are less than given T.


1 Answers

The Sieve of Atkin pseudo code from the Wikipedia article you've quoted contains the answers to your questions or the discussion about the article for which Will Ness has provided a link does, although you may not be able to put the information together. Short answers are as follows:

  1. The three equations come from Atkin's mathematical proof that all primes will occur as the solutions of one or more of those three equations with the appropriate modulo conditions for all valid values of the variables 'x' and 'y', and in fact he further proved that true primes will be those numbers that have an odd number of solutions to those three equations (left as True when toggled an odd number of times from initial conditions of False) with the modulo conditions for each excluding those numbers which are divisible by squares of found primes less than or equal to the square root of the tested number.

  2. The true Sieve of Atkin is based on a set of modulo 60 conditions; this pseudo code represents a simplification where there are less ranges of conditions for each equation using a set of modulo 12 conditions (5 × 12 = 60) - however, this results in there being 20% extra work being done because of introduced redundancy in this new set of conditions. It is also the reason that this simplified pseudo code needs to start its prime scan at 5 rather than the normal 7 and to do the base primes squares free eliminations starting at a base prime of 5 at an added cost in execution time as the factors of 5 are not otherwise handled properly. The reason for this simplification is perhaps to sacrifice some extra code overhead in order to ease the code complexity in having to check values, although this can be done with a single table look up whereas the extra over 30% in work due to using modulo 12 can not be reduced.

  3. The "reminders" (should be spelled remainders) are found by the 'mod' operators in the pseudo code you quoted which stand for the modulo operator in whatever computer language one might use, often represented by the "%" symbol in computer languages such as C, Java, C#, F#, et cetera. This operator yields the integer remainder after an integer division of the dividend (the first of the numbers, before the 'mod') by the divisor (the second of the numbers, after the 'mod' symbol). The reason that the remainders are just four rather than the 16 as used in the full modulo 60 algorithm is due to the simplifications of reducing to a modulo 12 algorithm. You'll notice that with the modulo 12 conditions, the "4x" condition passes 25 which would normally be eliminated by the modulo 60 conditions, thus many composites containing 25 as a factor need to be eliminated by the additional prime of 5 square free check. Similarly, 55 passes the "3x+" check and 35 passes the "3x-" check where they wouldn't for the full modulo 60 algorithm.

As discussed in the Wikipedia article "Talk" section and hinted above, this quoted pseudo code is never much faster than even basic odds-only Sieve of Eratosthenes (SoE) implementations let alone one that uses the same degree of wheel factorization due to its inefficiencies: the variables 'x' and 'y' do not need to range over the full range to the square root of the range sieved for many cases (mentioned in the article), the proper use of the modulo 60 wheel recovers the redundancies in the use of the modulo 12 simplification (as I mention above), and there are modulo lattice patterns in the solutions to the quadratic equations such that the conditions using the (computationally slow) modulo operations need not be tested by the use of an algorithm that automatically skips those solutions that would not satisfy those modulo conditions according to the lattice patterns (only mentioned very obscurely in the full Atkin and Bernstein paper).

To answer a question you didn't ask but should have: "Why use the Sieve of Atkin?".

The main reason that the Sieve of Atkin (SoA) is implemented rather than the Sieve of Eratosthenes (SoE) is that it is "Internet Knowledge" that the SOA is faster. There are two reasons for this belief, as follows:

  1. The SoA is assumed to be faster because the asymptotic computational complexity is less for it than for the SoE by a factor of log(log n) where n is the range of primes sieved. Practically speaking, going from a range of two to the power 32 (four billion plus) to two to the power 64 (about 2 followed by 19 zeros), this factor is six over five equals 1.2. That means that the true SoE should take 1.2 times as long as expected by just a linear ratio when sieving to the 64-bit number range as compared to the 32-bit number range, whereas the SoA will have a linear relationship if all were ideal. You can appreciate that, first, this is a very small factor for such a huge difference in number ranges and, second, that this benefit only holds if the two algorithms had the same or close to the same performance at some reasonable range of primes.

    What isn't clearly understood by that "Internet knowledge" is that these figures only apply when one is comparing the ratio of performance over a given range as compared to another given range for the same algorithm, not as a comparison between different algorithms. Thus it is useless as a proof that the SoA will be faster than the SoE since the SoA could start with a larger overhead for a given sieve range of a particular SoE algorithm as in the following optimized SoE example.

  2. The SoA is believed to be faster because of the computational comparison made and published by Atkin and Bernstein as per their paper linked in the Wikipedia article. While the work is accurate, it only applies to the artificial limitations that they made on the SoE algorithm they compared: Since the SoA algorithm is based on modulo 60 factorization which represents two 2,3,5 factorization wheel rotations, they also limited the SoE algorithm to that same wheel factorization. Doing this, the SoE does about 424,000 composite number cull operations over the one billion range tested, whereas a fully optimized SoA as tested has about 326,000 combined toggle and square free cull operations, which each take about the same time as each composite number cull operation in the SoE due to being written in the same style. This means that the SoA is about 30% faster than the SoE for this particular set of wheel factorization conditions, and that is just about exactly what the Atkins and Bernstein comparison test showed.

    However, the SoA does not respond to further levels of wheel factorization as the 2,3,5 level is "baked-in" to the algorithm, whereas the SoE does respond to further levels: using a 2,3,5,7 wheel factorization results in about the same number of operations meaning about the same performance for each. One can use even a partial higher level of wheel factorization than the 2,3,5,7 level to get the number of operations for SoE about 16.7% less than SoA, for a proportional better performance. The optimizations required to implement these extra levels of wheel factorization are actually simpler than the code complexity to implement the original optimized SoA. The memory footprint for implementing these comparable page segmented implementations is about the same, being the size of the page buffers plus the base primes array.

    In addition, both would benefit from being written in a "machine state look up" style which would help in better optimization using inlined code and extreme loop unrolling but the SoE befits more from this style than the SoA due to being a simpler algorithm.

So for sieve ranges up to about the 32-bit number range, the maximally optimized SoE is about 20% faster (or more with further wheel factorization) than the SoA; however, the SoA has this asymptotic computational complexity advantage, so there will be some point at which it catches up. That point will be at about the range where the ratio of log (log n) to log (log (2^32)) or 5 is 1.2 or a range of about 2 times ten to the nineteenth power - an exceedingly large number. If the optimized SoA run on a modern desktop computer were to take about a third of a second to compute the primes in the 32-bit number range, and if the implementation were ideal as in taking linear time increase with increasing range, it would take about 45 years to compute the primes to this range. However, analysis of prime numbers in these high ranges is often done in small chunks, for which the use of the SoA would be theoretically practical as compared to the SoE for very large number sieves, but by a very small factor.

EDIT_ADD: In actual fact, neither the page segmented SoE nor the SoA continue to run in linear time for these huge ranges as compared to the lower ranges as both run into problems with the "toggling" and "culling" operations losing efficiency due to having to skip large numbers of pages per jump; this means that both will require modified algorithms to handle this "page jumping" and the very slight advantage to the SoA may be completely erased if there are any slight differences in the way that this is done. The SoA has many more terms in the "jump tables" than the SoE by about the inverse ratio between of the number of primes found up to the square root of the range to that range, and this will likely add a O(log n) term to both in processing but a constant factor larger increase for the SoA which has a higher number of entries in the "jump table". This extra fact will pretty much completely cancel out any advantage of the SoA over the SoE, even for extremely large ranges. Further, the SoA has a constant overhead of more individual loops required for the many more cases in implementing the conditions for the three separate quadratic equations plus the "prime squares free" loop instead of just a primes culling loop so can never have as low an average computational time per operation as the SoE when fully optimized. END_EDIT_ADD

EDIT_ADD2: It is my opinion that much of the confusion about the Sieve of Atkin is due to the deficiencies of the pseudo code from the Wikipedia article as quoted in the question, so have come up with a somewhat better version of the pseudo code that addresses at least some of the deficiencies to do with the range of the 'x' and 'y' variables and the modulo 12 versus modulo 60 confusion as follows:

limit ← 1000000000        // arbitrary search limit

// Initialize the sieve
for i in {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    is_prime(i) ← false

// Put in candidate primes:
// integers which have an odd number of
// representations by certain quadratic forms.
while n ≤ limit, n ← 4x²+y² where x ∈ {1,2,...} and y ∈ {1,3,...} // odd y's
    if n mod 60 ∈ {1,13,17,29,37,41,49,53}:
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²+y² where x ∈ {1,3,...} and y ∈ {2,4,...} // only odd 
    if n mod 60 ∈ {7,19,31,43}:                            // x's and even y's
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²-y² where x ∈ {2,3,...} and y ∈ {x-1,x-3,...,1} //all 
    if n mod 60 ∈ {11,23,47,59}:                   // even/odd odd/even combos
        is_prime(n) ← ¬is_prime(n)

// Eliminate composites by sieving, only for those occurrences on the wheel
for n² ≤ limit where n ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    if is_prime(n):
        // n is prime, omit multiples of its square; this is
        // sufficient because composites which managed to get
        // on the list cannot be square-free
        while c ≤ limit, c ← n² × k where
                      k ∈ {1,7,11,13,17,19,23,29, 31,37,41,43,47,49,53,59,...}:
            is_prime(c) ← false

output 2, 3, 5
for n ≤ limit when n ← 60 × k + x where
  k ∈ {0..} and
  x ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61}:
    if is_prime(n): output n

The above seems quite simple and is quite a good algorithm except that it still isn't faster than a basic Sieve of Eratosthenes that uses the same 2;3;5 factorization wheel because it wastes almost half of its inner loop toggle operations that fail the modulo tests. To demonstrate:

Following is the repeating 4x²+y² pattern of qualifying modulo 60's across a range of 15 values of 'x' (every value) vertically and 15 odd values of 'y' horizontally; both starting at one. Notice that they are symmetrical about a vertical axis but only 128 out of 225 or 256 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
  1  0  0 49  0  1 49  0 49  1  0 49  0  0  1

Following is the repeating 3x²+y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 144 out of 450 for a full 30 number range of 'x' values are valid toggle operations as all 144 out of 450 invalid operations with even 'x' and odd 'y' have already been eliminated:

  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
 19 31  0 19  0  0 31 31  0  0 19  0 31 19  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 11 59  0 11  0  0 59 59  0  0 11  0 59 11  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 even values of 'x' vertically and 15 odd values of 'y' horizontally. Notice that they are symmetrical about a vertical axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 59  0  0 11  0 59 11  0 11 59  0 11  0  0 59

As one can determine by inspection of the above tables, for the pseudo code as above there is an overall repeating range of 30 values of x (including both evens and odds) which only has 688 valid operations out of a total of 1125 combined operations; thus it wastes almost half of its processing in just conditionally skipping over values due to the non-productive skipping operations being part of the innermost loops.There are two possible methods to eliminate that "hit" redundancy inefficiently, as follows:

  1. The method as outlined in the Atkin and Bernstein paper, which uses the recognized patterns in the combined modulos of 'x' and 'y' to process only the modulo "hits" using the fact that once one locates a given modulo on the pattern, then there are an infinite sequence of values at that some modulo (equals wheel bit position) where each pattern is separated by an easily calculated (variable) offset which has the form "current position plus A times the current value of 'x' plus B" and "current position plus C times the current value of 'y' plus D" where A, B, C, and D, are simple constants (simple meaning small easily manipulated). Bernstein used that method with the additional help of many Look Up Tables (LUTs).

  2. The method of creating overall pattern state Look Up Tables (LUTs) (one for each of the four tables above plus one for the minor prime square free loop) indexed by the modulo current values of 'x' combined with the modulo of 'y' to find the A, B, C, and D parameters to skip, not to the next pattern, but just to the next position on the horizontal sequence. This is likely the more highly performance option as it more easily allows for extreme clock cycle per operation reduction using in-lining of the unrolled loop code and will produce more efficient code for large ranges when page segmenting as the jumps per loop are smaller on average. This will likely make the clock cycles per loop close to that of a highly optimized Sieve of Eratosthenes as in primesieve, but will not likely get to quite that low due to having to compute the variable step sizes rather than being able to use fixed prime offsets for SoE.

So there are three governing objectives in reducing the running time for a prime sieve, as follows:

  1. A successful sieve reduces the number of operations, which even the "hit optimized" SoA fails as compared to a highly wheel factorized SoE by about 16.7% for ranges of about a few billion.

  2. A successful sieve reduces the CPU clock cycles per operation, which the SoA fails as compared to a highly optimized SoE such as primesieve because its operations are more complex due to the variable increments, again likely by 20% to 60%. Atkin and Bernstein's primegen (SoA) takes about 4.5 as compared to about 2.5 clock cycles per operation for primesieve (SoE) for a range of one billion for each, but perhaps the SoA could be sped up somewhat by borrowing some of the optimization techniques from primesieve.

  3. A successful sieve has reasonably low code complexity so that it can be more easily extended to larger ranges using such techniques as "bucket sieving" and other page segmentation optimizations. For this the Sieve of Atkin fails miserably as it gets exponentially more complex for page segmenting large number ranges. It is extremely complex to write a SoA program that would compete with even Bernstein's primegen (SoA) let alone with primesieve, whereas it is quite easy to write code that comes close to the same performance as primesieve.

  4. A successful sieve has a lower empirical complexity, which the SoA does have above the SoE by a factor of (log (log n) where n is the upper range to be sieved, but that extra small ratio is not likely ever enough to make up for the above top two combined loss ratios, as this factor gets smaller with increasing range. END_EDIT_ADD2

So the answer to the question "Why use the Sieve of Atkin?" is "There is no reason to use it at all if the SoE is implemented with maximum wheel factorization optimizations until the numbers sieved are extremely large (64-bit number range and higher) and then the SoA advantage is very small and perhaps not realizable at all depending on very minor tweaks in relative optimizations.".

As an answer to another similar Sieve of Atkin question, I have posted a page segmented C# version of the SoE optimized as per the above at: https://stackoverflow.com/a/20559687/549617.

like image 132
GordonBGood Avatar answered Oct 27 '22 12:10

GordonBGood