Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a Python list into a list of overlapping chunks

Tags:

python

This question is similar to Slicing a list into a list of sub-lists, but in my case I want to include the last element of the each previous sub-list, as the first element in the next sub-list. And have to take into account that the last element have always to have at least two elements.

For example:

list_ = ['a','b','c','d','e','f','g','h']

The result for a size 3 sub-list:

resultant_list = [['a','b','c'],['c','d','e'],['e','f','g'],['g','h']]
like image 687
efirvida Avatar asked Apr 13 '16 01:04

efirvida


People also ask

How would you split a list into evenly sized chunks?

The easiest way to split list into equal sized chunks is to use a slice operator successively and shifting initial and final position by a fixed number.

How do you split a list of objects in Python?

To split the elements of a list in Python: Use a list comprehension to iterate over the list. On each iteration, call the split() method to split each string. Return the part of each string you want to keep.

How do you split a list in n parts?

To split a list into N parts of approximately equal length with Python, we can use list comprehension. We define the chunkify function to split the lst list into n chunks. To do this, we use list comprehension to return slices of list with from index i to the end with n items in each chunk.


1 Answers

The list comprehension in the answer you linked is easily adapted to support overlapping chunks by simply shortening the "step" parameter passed to the range:

>>> list_ = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> n = 3  # group size
>>> m = 1  # overlap size
>>> [list_[i:i+n] for i in range(0, len(list_), n-m)]
[['a', 'b', 'c'], ['c', 'd', 'e'], ['e', 'f', 'g'], ['g', 'h']]

Other visitors to this question mightn't have the luxury of working with an input list (slicable, known length, finite). Here is a generator-based solution that can work with arbitrary iterables:

from collections import deque

def chunks(iterable, chunk_size=3, overlap=0):
    # we'll use a deque to hold the values because it automatically
    # discards any extraneous elements if it grows too large
    if chunk_size < 1:
        raise Exception("chunk size too small")
    if overlap >= chunk_size:
        raise Exception("overlap too large")
    queue = deque(maxlen=chunk_size)
    it = iter(iterable)
    i = 0
    try:
        # start by filling the queue with the first group
        for i in range(chunk_size):
            queue.append(next(it))
        while True:
            yield tuple(queue)
            # after yielding a chunk, get enough elements for the next chunk
            for i in range(chunk_size - overlap):
                queue.append(next(it))
    except StopIteration:
        # if the iterator is exhausted, yield any remaining elements
        i += overlap
        if i > 0:
            yield tuple(queue)[-i:]

Note: I've since released this implementation in wimpy.util.chunks. If you don't mind adding the dependency, you can pip install wimpy and use from wimpy import chunks rather than copy-pasting the code.

like image 195
wim Avatar answered Sep 23 '22 16:09

wim