Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Sieve of Eratosthenes so slow?

Im solving some problems on Project Euler and had to generate 2 million primes to solve a problem. My implementation of the Sieve of Eratosthenes turned out to be EXTREMELY slow but I don't quite know why. Could someone please explain the major problems with this implementation. I thought it was so pretty, and then I figured out it is utterly terrible :(. I found another implementation of it online and it was so much faster than mine.

def generatePrimes(upperBound):
    numbers = range(2,upperBound+1)
    primes = []

    while numbers:
        prime = numbers[0]
        primes.append(prime)
        numbers = filter((lambda x: x%prime),numbers)
    return primes

EDIT: Thanks for all the answers! Conclusions of this is that it is the filter that is the problem because it goes through every element (instead of only those that are to be marked as not prime) and because it creates a new list every time. Rewrote it with good old for loops and one round of filtering and it works much faster. New code:

def generatePrimes(upperBound):
numbers = range(2,upperBound+1)

for i in xrange(len(numbers)):
    if(numbers[i] != 0):
        for j in xrange(i+numbers[i],len(numbers),numbers[i]):
            numbers[j] = 0

primes = filter(lambda x: x,numbers)
return primes
like image 210
Alex Avatar asked May 29 '15 20:05

Alex


2 Answers

The Sieve of Eratosthenes looks like this:

def sieve(n):
    primality_flags = [True]*(n+1)
    primality_flags[0] = primality_flags[1] = False
    primes = []
    for i, flag in enumerate(primality_flags):
        if flag:
            primes.append(i)
            for j in xrange(2*i, n+1, i):
                primality_flags[i] = False
    return primes

It processes each number once when the outer loop reaches it, and once for every prime that divides it. About 1/2 the numbers are divisible by 2, about 1/3 are divisible by 3, and so on; asymptotically speaking, the average number of times each number will be processed is 1 + the sum of the reciprocals of the primes up to n. This sum is about log(log(n)), so the sieve has asymptotic time complexity O(n*log(log(n))), assuming arithmetic is constant time. This is really good.


Your function doesn't do that. Your filter goes over every element in numbers, regardless of whether it's divisible by prime. Each element is processed for every prime up until the first prime that divides it, and processing the prime p removes about 1/p of the elements of numbers. Letting the sequence of primes be p[0], p[1], p[2], etc., and letting the sequence of sizes of numbers be n[0], n[1], n[2], etc., we have the following approximate recurrence:

n[0] = upperBound - 1
n[1] = n[0] * (p[0]-1)/p[0]
n[2] = n[1] * (p[1]-1)/p[1]
...
n[k+1] = n[k] * (p[k]-1)/p[k]

and your algorithm takes time roughly proportional to the sum of the n values up until numbers is empty. I haven't analyzed the behavior of that series, but calculations show the growth is much worse than O(n*log(log(n))). (EDIT: An analysis I didn't come up with when composing this answer says it's O((n/log(n))^2).)

like image 193
user2357112 supports Monica Avatar answered Oct 27 '22 11:10

user2357112 supports Monica


Running cProfile shows that most of the time is spent in the filter. Replacing the filter with a list comprehension speeds things up a by about a factor of 2.

numbers = [n for n in numbers if n%prime != 0]

But this doesn't really fix the main problem, which is that you are are recreating the list of numbers with each iteration, and that is slow. The faster implementations http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d just mark the non primes by replacing them with 0 or similar.

like image 32
James K Avatar answered Oct 27 '22 12:10

James K