Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up string splitting and concatenation

Tags:

python

primes

I'm trying to solve Project Euler's problem #35

The number, 197, is called a circular prime because all rotations of the digits: 197, 971, and 719, are themselves prime.

How many circular primes are there below one million?

This is my solution:

import numpy as np

def problem(n=100):

    circulars = np.array([], np.int32)

    p = np.array(sieveOfAtkin(n), np.int32)
    for prime in p:
        prime_str = str(prime)
        is_circular = True
        for i in xrange(len(prime_str)):
            m = int(prime_str[i:]+prime_str[:i])
            if not m in p:
                is_circular = False

        if is_circular:
            circulars = np.append(circulars, [prime])

    return len(circulars)

Unfortunately the for-loop is mighty slow! Any ideas how I can speed this up? I suspect the string concatenation is the bottleneck, but I am not entirely sure! :)


Any ideas? :)

like image 623
RadiantHex Avatar asked Feb 25 '23 07:02

RadiantHex


1 Answers

  1. Use a set for membership testing instead of an array. The hash lookup will be O(1) instead of O(n). This is the biggest bottleneck.

  2. Break out of the loop as soon as you see that it's not a circular prime instead of trying the other rotations. This is another bottleneck.


Here, I've isolated the circularity testing into a function to allow the list to be built with a list comprehension. Having it in a function, lets it return False as soon as we know it's not circular. Another alternative would be to do it in a for loop and break when we know it's not circular. Then append to the list in the loop's else clause. Generally speaking, list comps are faster than appending in a loop though. That might not be the case here because it does add function call overhead. If you really care about speed, it would be worth it to profile both options.

primes = set(primes_to_one_million_however_you_want_to_get_them)

def is_circular(prime, primes=primes):
   prime_str = str(prime)
   # With thanks to Sven Marnach's comments
   return all(int(prime_str[i:]+prime_str[:i]) in primes 
              for i in xrange(len(prime_str)))


circular_primes = [p for p in primes if is_circular(p)]

I've also used the trick of passing a global as a default argument to the is_circular function. This means that it can be accessed within the function as a local variable instead of a global variable which is faster.

Here's one way to code it using an else clause on a loop to get rid of that ugly flag and improve efficiency.

circular = []
for p in primes:
   prime_str = str(prime)
   for i in xrange(len(prime_str)):
       if int(prime_str[i:]+prime_str[:i]) not in primes:
            break
   else:
       circular.append(p)
like image 121
aaronasterling Avatar answered Mar 05 '23 12:03

aaronasterling