Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum of primes below 2,000,000 in python

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.

However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.

Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.

Can someone please help me optimize my code, and clearly explain how it works along the way?

Here is my code:

primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
    pos += 1
    number = numbers[pos] # moves to next prime in list numbers
    sum_of_primes += number # adds prime to total sum
    num = number
    while num < primes_below_number:
        num += number
        if num in numbers[:]:
            numbers.remove(num) # removes multiples of prime found

print sum_of_primes + 2

As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.

like image 350
Daniel H. Avatar asked Oct 31 '22 19:10

Daniel H.


2 Answers

As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.

Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.

Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.

We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.

# number to find summation of all primes below number
hi = 2000000

# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2)) 

for number in xrange(3, int(hi ** 0.5) + 1):
    if number not in numbers:
        #number must have been removed because it has a prime factor
        continue

    num = number
    while num < hi:
        num += number
        if num in numbers:
            # Remove multiples of prime found
            numbers.remove(num)

print 2 + sum(numbers)

You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.

You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.

There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)

like image 75
PM 2Ring Avatar answered Nov 15 '22 05:11

PM 2Ring


First, removing numbers from the list will be very slow. Instead of this, make a list

primes = primes_below_number * True
primes[0] = False
primes[1] = False

Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)

At the end,

primes = [n for n i range(primes_below_number) if primes[n]]

This should be a great deal faster.

Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.

like image 33
saulspatz Avatar answered Nov 15 '22 03:11

saulspatz