I'm currently working through the problems on Project Euler, and so far I've come up with this code for a problem. <pre class="prettyprint"><code>from itertools import combinations import time def findanums(n): l = [] for i in range(1, n + 1): s = [] for j in range(1, i): if i % j == 0: s.append(j) if sum(s) > i: l.append(i) return l start = time.time() #start time limit = 28123 anums = findanums(limit + 1) #abundant numbers (1..limit) print "done finding abundants", time.time() - start pairs = combinations(anums, 2) print "done finding combinations", time.time() - start sums = map(lambda x: x[0]+x[1], pairs) print "done finding all possible sums", time.time() - start print "start main loop" answer = 0 for i in range(1,limit+1): if i not in sums: answer += i print "ANSWER:",answer </code></pre> When I run this I run into a <code>MemoryError</code>. The traceback: <pre class="prettyprint"><code>File "test.py", line 20, in <module> sums = map(lambda x: x[0]+x[1], pairs) </code></pre> I've tried to prevent it by disabling garbage collection from what I've been able to get from Google but to no avail. Am I approaching this the wrong way? In my head this feels like the most natural way to do it and I'm at a loss at this point. SIDE NOTE: I'm using PyPy 2.0 Beta2(Python 2.7.4) because it is so much faster than any other python implementation I've used, and Scipy/Numpy are over my head as I'm still just beginning to program and I don't have an engineering or strong math background.

As Kevin mention in the comments, your algorithm might be wrong, but anyway your code is not optimized. When using very big lists, it is common to use <code>generators</code>, there is a famous, great Stackoverflow answer explaining the concepts of <code>yield</code> and <code>generator</code> - What does the "yield" keyword do in Python? The problem is that your <code>pairs = combinations(anums, 2)</code> doesn't generate a <code>generator</code>, but generates a large object which is much more memory consuming. I changed your code to have this function, since you iterating over the collection only once, you can lazy evaluate the values: <pre class="prettyprint"><code>def generator_sol(anums1, s): for comb in itertools.combinations(anums1, s): yield comb </code></pre> Now instead of <code>pairs = combinations(anums, 2)</code> which generates a large object. Use: <pre class="prettyprint"><code>pairs = generator_sol(anums, 2) </code></pre> Then, instead of using the <code>lambda</code> I would use another <code>generator</code>: <pre class="prettyprint"><code>sums_sol = (x[0]+x[1] for x in pairs) </code></pre> Another tip, you better look at xrange which is more suitable, it doens't generate a list but an <code>xrange object</code> which is more efficient in many cases (such as here).

Let me suggest you to use generators. Try changing this: <pre class="prettyprint"><code>sums = map(lambda x: x[0]+x[1], pairs) </code></pre> to <pre class="prettyprint"><code>sums = (a+b for (a,b) in pairs) </code></pre> Ofiris solution is also ok but implies that <code>itertools.combinations</code> return a list when it's wrong. If you are going to keep solving project euler problems have a look at the itertools documentation.

The issue is that anums is big - about 28000 elements long. so pairs must be 28000*28000*8 bytes = 6GB. If you used numpy you could cast anums as a numpy.int16 array, in which case the result pairs would be 1.5GB - more manageable: <pre class="prettyprint"><code>import numpy as np #cast anums = np.array([anums],dtype=np.int16) #compute the sum of all the pairs via outer product pairs = (anums + anums.T).ravel() </code></pre>

Is there a way to avoid this memory error?

Tags:

python

memory

pypy

I'm currently working through the problems on Project Euler, and so far I've come up with this code for a problem.

from itertools import combinations
import time

def findanums(n):
    l = []
    for i in range(1, n + 1):
        s = []
        for j in range(1, i):
            if i % j == 0:
                s.append(j)
        if sum(s) > i:
            l.append(i)
    return l

start = time.time() #start time

limit = 28123

anums = findanums(limit + 1) #abundant numbers (1..limit)
print "done finding abundants", time.time() - start

pairs = combinations(anums, 2)
print "done finding combinations", time.time() - start

sums = map(lambda x: x[0]+x[1], pairs)
print "done finding all possible sums", time.time() - start

print "start main loop"
answer = 0
for i in range(1,limit+1):
    if i not in sums:
        answer += i
print "ANSWER:",answer

When I run this I run into a MemoryError.

The traceback:

File "test.py", line 20, in <module>
    sums = map(lambda x: x[0]+x[1], pairs)

I've tried to prevent it by disabling garbage collection from what I've been able to get from Google but to no avail. Am I approaching this the wrong way? In my head this feels like the most natural way to do it and I'm at a loss at this point.

SIDE NOTE: I'm using PyPy 2.0 Beta2(Python 2.7.4) because it is so much faster than any other python implementation I've used, and Scipy/Numpy are over my head as I'm still just beginning to program and I don't have an engineering or strong math background.

905

asked Apr 18 '13 19:04

Jesse Neff

3 Answers

As Kevin mention in the comments, your algorithm might be wrong, but anyway your code is not optimized.

When using very big lists, it is common to use generators, there is a famous, great Stackoverflow answer explaining the concepts of yield and generator - What does the "yield" keyword do in Python?

The problem is that your pairs = combinations(anums, 2) doesn't generate a generator, but generates a large object which is much more memory consuming.

I changed your code to have this function, since you iterating over the collection only once, you can lazy evaluate the values:

def generator_sol(anums1, s):
      for comb in itertools.combinations(anums1, s):
        yield comb

Now instead of pairs = combinations(anums, 2) which generates a large object. Use:

pairs = generator_sol(anums, 2)

Then, instead of using the lambda I would use another generator:

sums_sol = (x[0]+x[1] for x in pairs)

Another tip, you better look at xrange which is more suitable, it doens't generate a list but an xrange object which is more efficient in many cases (such as here).

answered Oct 12 '22 05:10

Ofiris

Let me suggest you to use generators. Try changing this:

sums = map(lambda x: x[0]+x[1], pairs)

sums = (a+b for (a,b) in pairs)

Ofiris solution is also ok but implies that itertools.combinations return a list when it's wrong. If you are going to keep solving project euler problems have a look at the itertools documentation.

answered Oct 12 '22 04:10

Alex

The issue is that anums is big - about 28000 elements long. so pairs must be 28000*28000*8 bytes = 6GB. If you used numpy you could cast anums as a numpy.int16 array, in which case the result pairs would be 1.5GB - more manageable:

import numpy as np
#cast
anums = np.array([anums],dtype=np.int16)
#compute the sum of all the pairs via outer product
pairs = (anums + anums.T).ravel()

answered Oct 12 '22 06:10

Patrick Mineault

Related questions
                            
                                Export GMail Contacts via Unattended Script
                            
                                segmented linear regression in python
                            
                                How Can I Downgrade from Python 3.2 to 2.7?
                            
                                Django: Display values of the selected multiple choice field in a template
                            
                                pandas reading csv orientation
                            
                                Image resize using PIL changes colors drastically
                            
                                PGP-signing multipart e-mails with Python
                            
                                How to change the default version of python in a linux machine ?(not just symlink) [closed]
                            
                                Using git to Track changes to dropbox?
                            
                                matplotlib: faster PDF generation?
                            
                                using python urllib2 to send POST request and get response
                            
                                Python/Regex - Match .#,#. in String
                            
                                Can not get simplest pipeline example to work in scrapy
                            
                                Set "publish to web" in Google spreadsheet using Drive python API
                            
                                Drawing SVG on Kivy canvas
                            
                                strip a verbose python regex
                            
                                Pricing a Floating Bond in quantlib using Python
                            
                                Transfer ownership of numpy data
                            
                                sqlalchemy: connect to MySQL without password
                            
                                efficient numpy.fromfile on zipped files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With