Project Euler 10 - Why the first python code runs much faster than the second one?

Question

The 10th problem in Project Euler:

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

I found this snippet :

sieve = [True] * 2000000 # Sieve is faster for 2M primes
def mark(sieve, x):
    for i in xrange(x+x, len(sieve), x):
        sieve[i] = False

for x in xrange(2, int(len(sieve) ** 0.5) + 1):
    if sieve[x]: mark(sieve, x)

print sum(i for i in xrange(2, len(sieve)) if sieve[i])

published here which run for 3 seconds.

I wrote this code:

def isprime(n):
    for x in xrange(3, int(n**0.5)+1):
        if n % x == 0:
            return False
    return True

sum=0;
for i in xrange(1,int(2e6),2):
    if isprime(i):
        sum += i

I don't understand why my code (the second one) is much slower?

ypercubeᵀᴹ · Accepted Answer

Your algorithm is checking every number individually from 2 to N (where N=2000000) for primality.

Snippet-1 uses the sieve of Eratosthenes algorithm, discovered about 2200 years ago. It does not check every number but:

Makes a "sieve" of all numbers from 2 to 2000000.
Finds the first number (2), marks it as prime, then deletes all its multiples from the sieve.
Then finds the next undeleted number (3), marks it as prime and deletes all its multiples from the sieve.
Then finds the next undeleted number (5), marks it as prime and deletes all its multiples from the sieve.
...
Until it finds the prime 1409 and deletes all its multiples from the sieve.
Then all primes up to 1414 ~= sqrt(2000000) have been found and it stops
The numbers from 1415 up to 2000000 do not have to be checked. All of them who have not been deleted are primes, too.

So the algorithm produces all primes up to N.

Notice that it does not do any division, only additions (not even multiplications, and not that it matters with so small numbers but it might with bigger ones). Time complexity is O(n loglogn) while your algorithm has something near O(n^(3/2)) (or O(n^(3/2) / logn) as @Daniel Fischer commented), assuming divisions cost the same as multiplications.

From the Wikipedia (linked above) article:

Time complexity in the random access machine model is O(n log log n) operations, a direct consequence of the fact that the prime harmonic series asymptotically approaches log log n.

(with n = 2e6 in this case)

Óscar López · Answer

The first version pre-computes all the primes in the range and stores them in the sieve array, then finding the solution is a simple matter of adding the primes in the array. It can be seen as a form of memoization.

The second version tests for each number in the range to see if it is prime, repeating a lot of work already made by previous calculations.

In conclusion, the first version avoids re-computing values, whereas the second version performs the same operations again and again.

Project Euler 10 - Why the first python code runs much faster than the second one?

Tags:

python

primes

number-theory

0x90

2 Answers

ypercubeᵀᴹ

Óscar López

Recent Activity

Donate For Us

Project Euler 10 - Why the first python code runs much faster than the second one?

Tags:

python

primes

number-theory

0x90

2 Answers

ypercubeᵀᴹ

Óscar López

Related questions

Recent Activity

Donate For Us