Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does sum do?

At first, I want to test the memory usage between generator and list-comprehension.The book give me a little bench code snippet and I run it on my PC(python3.6, Windows),find something unexpected.

  1. On the book, it said, because list-comprehension has to create a real list and allocate memory for it, itering from a list-comprehension must be slower than itering from a generator.
  2. Ofcourse, list-comprehension use more memory than generator.

FOllowing is my code,which is not satisfy previous opinion(within sum function).

import tracemalloc
from time import time


def timeIt(func):
    start = time()
    func()
    print('%s use time' % func.__name__, time() - start)
    return func


tracemalloc.start()

numbers = range(1, 1000000)


@timeIt
def lStyle():
    return sum([i for i in numbers if i % 3 == 0])


@timeIt
def gStyle():
    return sum((i for i in numbers if i % 3 == 0))


lStyle()

gStyle()

shouldSize = [i for i in numbers if i % 3 == 0]

snapshotL = tracemalloc.take_snapshot()
top_stats = snapshotL.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

The output:

lStyle use time 0.4460000991821289
gStyle use time 0.6190001964569092
[ Top 10 ]
F:/py3proj/play.py:31: size=11.5 MiB, count=333250, average=36 B
F:/py3proj/play.py:33: size=448 B, count=1, average=448 B
F:/py3proj/play.py:22: size=136 B, count=1, average=136 B
F:/py3proj/play.py:17: size=136 B, count=1, average=136 B
F:/py3proj/play.py:14: size=76 B, count=2, average=38 B
F:/py3proj/play.py:8: size=34 B, count=1, average=34 B

Two points:

  • Generator use more time and same memory space.
  • The list-comprehension in sum function seems not create the total list

I think maybe the sum function did something i don't know.Who can explain this?

The book is High Perfoamance Python.chapter 5.But i did sth myself different from the book to check the validity in other context. And his code is here book_code,he didn't put the list-comprehension in sum funciton.

like image 295
dogewang Avatar asked Oct 28 '22 11:10

dogewang


1 Answers

When it comes to time performance test, I do rely on the timeit module because it automatically executes multiple runs of the code.

On my system, timeit gives following results (I strongly reduced sizes because of the numerous runs):

>>> timeit.timeit("sum([i for i in numbers if i % 3 == 0])", "numbers = range(1, 1000)")
59.54427594248068
>>> timeit.timeit("sum((i for i in numbers if i % 3 == 0))", "numbers = range(1, 1000)")
64.36398425334801

So the generator is slower by about 8% (*). This is not really a surprize because the generator has to execute some code on the fly to get next value, while a precomputed list only increment its current pointer.

Memory evalutation is IMHO more complex, so I have used Compute Memory footprint of an object and its contents (Python recipe) from activestate

>>> numbers = range(1, 100)
>>> numbers = range(1, 100000)
>>> l = [i for i in numbers if i % 3 == 0]
>>> g = (i for i in numbers if i % 3 == 0)
>>> total_size(l)
1218708
>>> total_size(g)
88
>>> total_size(numbers)
48

My interpretation is that a list uses memory for all of its items (which is not a surprize), while a generator only need few values and some code so the memory footprint is much lesser for the generator.

I strongly think that you have used tracemalloc for something it is not intended for. It is aimed at searching possible memory leaks (large blocs of memory never deallocated) and not at controlling the memory used by individual objects.


BEWARE: I could only test for small sizes. But for very large sizes, the list could exhaust the available memory and the machine will use virtual memory from swap. In that case, the list version will become much slower. More details there

like image 145
Serge Ballesta Avatar answered Nov 15 '22 07:11

Serge Ballesta