Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 generator comprehension to generate chunks including last

If you have a list in Python 3.7:

>>> li
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can turn that into a list of chunks each of length n with one of two common Python idioms:

>>> n=3
>>> list(zip(*[iter(li)]*n))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Which drops the last incomplete tuple since (9,10) is not length n

You can also do:

>>> [li[i:i+n] for i in range(0,len(li),n)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

if you want the last sub list even if it has less than n elements.

Suppose now I have a generator, gen, unknown length or termination (so calling list(gen)) or sum(1 for _ in gen) would not be wise) where I want every chunk.

The best generator expression that I have been able to come up with is something along these lines:

from itertools import zip_longest
sentinel=object()             # for use in filtering out ending chunks
gen=(e for e in range(22))    # fill in for the actual gen

g3=(t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(gen)]*n,fillvalue=sentinel))

That works for the intended purpose:

>>> next(g3)
(0, 1, 2)
>>> next(g3)
(3, 4, 5)
>>> list(g3)
[(6, 7, 8), (9, 10)]

It just seems -- clumsy. I tried:

  1. using islice but the lack of length seems hard to surmount;
  2. using a sentinel in iter but the sentinel version of iter requires a callable, not an iterable.

Is there a more idiomatic Python 3 technique for a generator of chunks of length n including the last chuck that might be less than n?

I am open to a generator function as well. I am looking for something idiomatic and mostly more readable.


Update:

DSM's method in his deleted answer is very good I think:

>>> g3=(iter(lambda it=iter(gen): tuple(islice(it, n)), ()))
>>> next(g3)
(0, 1, 2)
>>> list(g3)
[(3, 4, 5), (6, 7, 8), (9, 10)]

I am open to this question being a dup but the linked question is almost 10 years old and focused on a list. There is no new method in Python 3 with generators where you don't know the length and don't want any more than a chunk at a time?

like image 864
dawg Avatar asked Jul 20 '18 15:07

dawg


1 Answers

I think this is always going to be messy as long as you're trying to fit this into a one liner. I would just bite the bullet and go with a generator function here. Especially useful if you don't know the actual size (say, if gen is an infinite generator, etc).

from itertools import islice

def chunk(gen, k):
    """Efficiently split `gen` into chunks of size `k`.

       Args:
           gen: Iterator to chunk.
           k: Number of elements per chunk.

       Yields:
           Chunks as a list.
    """ 
    while True:
        chunk = [*islice(gen, 0, k)]
        if chunk:
            yield chunk
        else:
            break

>>> gen = iter(list(range(11)))
>>> list(chunk(gen))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

Someone may have a better suggestion, but this is how I'd do it.

like image 56
cs95 Avatar answered Oct 16 '22 23:10

cs95