Length of a finite generator

Tags:

python

generator

I have these two implementations to compute the length of a finite generator, while keeping the data for further processing:

def count_generator1(generator):
    '''- build a list with the generator data
       - get the length of the data
       - return both the length and the original data (in a list)
       WARNING: the memory use is unbounded, and infinite generators will block this'''
    l = list(generator)
    return len(l), l

def count_generator2(generator):
    '''- get two generators from the original generator
       - get the length of the data from one of them
       - return both the length and the original data, as returned by tee
       WARNING: tee can use up an unbounded amount of memory, and infinite generators will block this'''
    for_length, saved  = itertools.tee(generator, 2)
    return sum(1 for _ in for_length), saved

Both have drawbacks, both do the job. Could somebody comment on them, or even offer a better alternative?

996

asked Aug 02 '13 10:08

blueFast

2 Answers

I ran Windows 64-bit Python 3.4.3 timeit on a few approaches I could think of:

>>> from timeit import timeit
>>> from textwrap import dedent as d
>>> timeit(
...     d("""
...     count = -1
...     for _ in s:
...         count += 1
...     count += 1
...     """),
...     "s = range(1000)",
... )
50.70772041983173
>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656
>>> timeit(
...     d("""
...     count, _ = reduce(f, enumerate(range(1000)), (-1, -1))
...     count += 1
...     """),
...     d("""
...     from functools import reduce
...     def f(_, count):
...         return count
...     s = range(1000)
...     """),
... )
121.15513102540672
>>> timeit("count = sum(1 for _ in s)", "s = range(1000)")
58.179126025925825
>>> timeit("count = len(tuple(s))", "s = range(1000)")
19.777029680237774
>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932
>>> timeit("count = len(list(1 for _ in s))", "s = range(1000)")
57.41422175998332

Shockingly, the fastest approach was to use a list (not even a tuple) to exhaust the iterator and get the length from there:

>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932

Of course, this risks memory issues. The best low-memory alternative was to use enumerate on a NOOP for-loop:

>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656

Cheers!

104

answered Sep 28 '22 02:09

John Crawford

If you have to do this, the first method is much better - as you consume all the values, itertools.tee() will have to store all the values anyway, meaning a list will be more efficient.

To quote from the docs:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

answered Sep 28 '22 02:09

Gareth Latty

Related questions
                            
                                Python Django Admin Clean() Method not overiding values
                            
                                Celery: launch task on start
                            
                                How to properly convert a C ioctl call to a python fcntl.ioctl call?
                            
                                NumPy reading file with filtering lines on the fly
                            
                                SQLAlchemy Union Parenthesis Issue
                            
                                Python Custom Iterator: Close a file on StopIteration
                            
                                Preserve ordering when consolidating two lists into a dict
                            
                                Where to put the debug flag in flask applications
                            
                                in python webapp2 how put a __init__ in a handler (for get and post)
                            
                                How to pass in a starting sequence number to a Django factory_boy factory?
                            
                                Python regular expression to filter list of strings matching a pattern
                            
                                python: elegant way of finding the GPS coordinates of a circle around a certain GPS location
                            
                                Double the length of a python numpy array with interpolated new values
                            
                                Embed .SVG files into PDF using reportlab
                            
                                Static files application_readable usage
                            
                                excel access denied with win32 python pywin32
                            
                                Difference between 'not x' and 'x==None' in python
                            
                                Field default timestamp set to table creation time instead of row creation time
                            
                                numpy Stacking 1D arrays into structured array
                            
                                How can I capture 'Ctrl-D' in python interactive console?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With