Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculations on sliding windows and memoization

I am working on Project Euler Problem 50, which states:

The prime 41, can be written as the sum of six consecutive primes:

41 = 2 + 3 + 5 + 7 + 11 + 13 This is the longest sum of consecutive primes that adds to a prime below one-hundred.

The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.

Which prime, below one-million, can be written as the sum of the most consecutive primes?

For determining the terms in prime P (if it at all can be written as a sum of primes) I use a sliding window of all the primes (in increasing order) up to (but not including) P, and calculate the sum of all these windows, if the sum is equal to the prime considered, I count the length of the window...

This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)

So I found the standard memoizaton implemention on the net and just pasted it in my code, is this correct? (I have no idea how it is supposed to work here...)

primes = tuple(n for n in range(1, 10**6) if is_prime(n)==True)

count_best = 0


##http://docs.python.org/release/2.3.5/lib/itertools-example.html:
## Slightly modified (first for loop)
from itertools import islice
    def window(seq):
    for n in range(2, len(seq) + 1):

        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result   

def memoize(function):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function


@memoize 


def find_lin_comb(prime):
    global count_best

    for windows in window(primes[0 : primes.index(prime)]):
        if sum(windows) == prime and len(windows) > count_best:
            count_best = len(windows)
            print('Prime: ', prime, 'Terms: ', count_best)


##Find them:
for x in primes[::-1]: find_lin_comb(x)

(btw, the tuple of prime numbers is generated "decently" fast)

All input is appreciated, I am just a hobby programmer, so please don´t get to advanced on me.

Thank you!

Edit: here is a working code paste that doesn´t have ruined indentations: http://pastebin.com/R1NpMqgb

like image 417
luffe Avatar asked Oct 07 '22 19:10

luffe


2 Answers

This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)

Yes, right. And of course it's slow for the primes up to 106.

Say you have n primes up to N, numbered in increasing order, p_1 = 2, p_2 = 3, .... When considering whether prime no. k is the sum of consecutive primes, you consider all windows [p_i, ..., p_j], for pairs (i,j) with i < j < k. There are (k-1)*(k-2)/2 of them. Going through all k to n, you examine about n³/6 windows in total (counting multiplicity, you're examining w(i.j) in total n-j times). Even ignoring the cost of creating the window and summing it, you can see how it scales badly:

  • For N = 1000, there are n = 168 primes and about 790000 windows to examine (counting multiplicity).
  • For N = 10**6, there are n = 78498 primes and about 8.3*10**13 windows to examine.

Now factor in the work for creating and summing the windows, estimate it low at j-i+1 for summing the j-i+1 primes in w(i,j), the work for p_k is about k³/6, and the total work becomes roughly k**4/24. Something like 33 million steps for N = 1000, peanuts, but nearly 1.6*10**18 for N = 1000000.

A year contains about 3.1*10**7 seconds, with a ~3GHz CPU, that's roughly 1017 clock cycles. So we're talking of an operation needing something like 100 CPU-years (may be a factor of 10 off or so).

You aren't willing to wait that long, I suppose;)

Now, with memoisation, you still look at each window multiple times, but you do the computation of each window only once. That means you need about n³/6 work for the computation of the windows, and look about n³/6 times at any window.

  • Problem 1: You still need to look at windows about 8.3*10**13 times, that's several hours even if looking cost only one cycle.
  • Problem 2: There are about 8.3*10**13 windows to memoise. You don't have that much memory, unless you can use a really large HD.

You can circumvent the memory problem by throwing away data you don't need anymore and only calculating data for the windows when it is needed, but once you know which data you may throw away when, you should be able to see a much better approach.

The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.

What does this tell you about the window generating that sum? Where can it start, where can it stop? How can you use that information to create an efficient algorithm to solve the problem?

like image 145
Daniel Fischer Avatar answered Oct 10 '22 10:10

Daniel Fischer


The memoize decorator adds a wrapper to a function to cache the return value for each value of the argument (each combination of values in case of multiple arguments). It is useful when the function is called multiple times with the same arguments. You can only use it with a pure function, i.e.

  1. The function has no side effects. Changing a global variable and doing output are examples of side effects.
  2. The return value depends only on the values of the arguments, not on some global variables that may change values between calls.

Your find_lin_comb function does not satisfy the above criteria. For one thing, it is called with a different argument every time, for another, the function does not return a value.

like image 28
Janne Karila Avatar answered Oct 10 '22 09:10

Janne Karila