What does sum do?

Question

At first, I want to test the memory usage between generator and list-comprehension.The book give me a little bench code snippet and I run it on my PC(python3.6, Windows),find something unexpected.

On the book, it said, because list-comprehension has to create a real list and allocate memory for it, itering from a list-comprehension must be slower than itering from a generator.
Ofcourse, list-comprehension use more memory than generator.

FOllowing is my code,which is not satisfy previous opinion(within sum function).

import tracemalloc
from time import time


def timeIt(func):
    start = time()
    func()
    print('%s use time' % func.__name__, time() - start)
    return func


tracemalloc.start()

numbers = range(1, 1000000)


@timeIt
def lStyle():
    return sum([i for i in numbers if i % 3 == 0])


@timeIt
def gStyle():
    return sum((i for i in numbers if i % 3 == 0))


lStyle()

gStyle()

shouldSize = [i for i in numbers if i % 3 == 0]

snapshotL = tracemalloc.take_snapshot()
top_stats = snapshotL.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

The output:

lStyle use time 0.4460000991821289
gStyle use time 0.6190001964569092
[ Top 10 ]
F:/py3proj/play.py:31: size=11.5 MiB, count=333250, average=36 B
F:/py3proj/play.py:33: size=448 B, count=1, average=448 B
F:/py3proj/play.py:22: size=136 B, count=1, average=136 B
F:/py3proj/play.py:17: size=136 B, count=1, average=136 B
F:/py3proj/play.py:14: size=76 B, count=2, average=38 B
F:/py3proj/play.py:8: size=34 B, count=1, average=34 B

Two points:

Generator use more time and same memory space.
The list-comprehension in sum function seems not create the total list

I think maybe the sum function did something i don't know.Who can explain this?

The book is High Perfoamance Python.chapter 5.But i did sth myself different from the book to check the validity in other context. And his code is here book_code,he didn't put the list-comprehension in sum funciton.

Serge Ballesta · Accepted Answer

When it comes to time performance test, I do rely on the timeit module because it automatically executes multiple runs of the code.

On my system, timeit gives following results (I strongly reduced sizes because of the numerous runs):

>>> timeit.timeit("sum([i for i in numbers if i % 3 == 0])", "numbers = range(1, 1000)")
59.54427594248068
>>> timeit.timeit("sum((i for i in numbers if i % 3 == 0))", "numbers = range(1, 1000)")
64.36398425334801

So the generator is slower by about 8% (*). This is not really a surprize because the generator has to execute some code on the fly to get next value, while a precomputed list only increment its current pointer.

Memory evalutation is IMHO more complex, so I have used Compute Memory footprint of an object and its contents (Python recipe) from activestate

>>> numbers = range(1, 100)
>>> numbers = range(1, 100000)
>>> l = [i for i in numbers if i % 3 == 0]
>>> g = (i for i in numbers if i % 3 == 0)
>>> total_size(l)
1218708
>>> total_size(g)
88
>>> total_size(numbers)
48

My interpretation is that a list uses memory for all of its items (which is not a surprize), while a generator only need few values and some code so the memory footprint is much lesser for the generator.

I strongly think that you have used tracemalloc for something it is not intended for. It is aimed at searching possible memory leaks (large blocs of memory never deallocated) and not at controlling the memory used by individual objects.

BEWARE: I could only test for small sizes. But for very large sizes, the list could exhaust the available memory and the machine will use virtual memory from swap. In that case, the list version will become much slower. More details there

What does sum do?

Tags:

python

generator

python-internals

sum

dogewang

1 Answers

Serge Ballesta

Recent Activity

Donate For Us

What does sum do?

Tags:

python

generator

python-internals

sum

dogewang

1 Answers

Serge Ballesta

Related questions

Recent Activity

Donate For Us