Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split list into separate but overlapping chunks

Tags:

python

list

Let's say I have a list A

A = [1,2,3,4,5,6,7,8,9,10]

I would like to create a new list (say B) using the above list in the following order.

B = [[1,2,3], [3,4,5], [5,6,7], [7,8,9], [9,10,]]

i.e. the first 3 numbers as A[0,1,2] and the second 3 numbers as A[2,3,4] and so on.

I believe there is a function in numpy for such a kind of operation.

like image 508
Rangooski Avatar asked Jul 02 '16 18:07

Rangooski


3 Answers

Simply use Python's built-in list comprehension with list-slicing to do this:

>>> A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> size = 3
>>> step = 2
>>> A = [A[i : i + size] for i in range(0, len(A), step)]

This gives you what you're looking for:

>>> A
[[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9], [9, 10]]

But you'll have to write a couple of lines to make sure that your code doesn't break for unprecedented values of size/step.

like image 155
cs95 Avatar answered Oct 20 '22 14:10

cs95


The 'duplicate' Partition array into N chunks with Numpy suggests np.split - that's fine for non-overlapping splits. The example (added after the close?) overlaps, one element across each subarray. Plus it pads with a 0.

How do you split a list into evenly sized chunks? has some good list answers, with various forms of generator or list comprehension, but at first glance I didn't see any that allow for overlaps - though with a clever use of iterators (such as iterator.tee) that should be possible.

We can blame this on poor question wording, but it is not a duplicate.

Working from the example and the comment:

Here my window size is 3., i.e each splitted list should have 3 elements first split [1,2,3] and the step size is 2 , So the second split start should start from 3rd element and 2nd split is [3,4,5] respectively.

Here is an advanced solution using as_strided

In [64]: ast=np.lib.index_tricks.as_strided  # shorthand 

In [65]: A=np.arange(1,12)

In [66]: ast(A,shape=[5,3],strides=(8,4))
Out[66]: 
array([[ 1,  2,  3],
       [ 3,  4,  5],
       [ 5,  6,  7],
       [ 7,  8,  9],
       [ 9, 10, 11]])

I increased the range of A because I didn't want to deal with the 0 pad.

Choosing the target shape is easy, 5 sets of 3. Choosing the strides requires more knowledge about striding.

In [69]: x.strides
Out[69]: (4,)

The 1d striding, or stepping from one element to the next, is 4 bytes (the length one element). The step from one row to the next is 2 elements of the original, or 2*4 bytes.

as_strided produces a view. Thus changing an element in it will affect the original, and may change overlapping values. Add .copy() to make a copy; math with the strided array will also produce a copy.

Changing the strides can give non overlapping rows - but be careful about the shape - it is possible to access values outside of the original data buffer.

In [82]: ast(A,shape=[4,3],strides=(12,4))
Out[82]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 17]])

In [84]: ast(A,shape=[3,3],strides=(16,4))
Out[84]: 
array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

edit

A new function gives a safer version of as_strided.

np.lib.strided_tricks.sliding_window_view(np.arange(1,10),3)[::2]
like image 39
hpaulj Avatar answered Oct 20 '22 15:10

hpaulj


This function that I wrote may help you, although it only outputs filled chunks with a length of len_chunk:

def overlap(array, len_chunk, len_sep=1):
    """Returns a matrix of all full overlapping chunks of the input `array`, with a chunk
    length of `len_chunk` and a separation length of `len_sep`. Begins with the first full
    chunk in the array. """

    n_arrays = np.int(np.ceil((array.size - len_chunk + 1) / len_sep))

    array_matrix = np.tile(array, n_arrays).reshape(n_arrays, -1)

    columns = np.array(((len_sep*np.arange(0, n_arrays)).reshape(n_arrays, -1) + np.tile(
        np.arange(0, len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)

    rows = np.array((np.arange(n_arrays).reshape(n_arrays, -1) + np.tile(
        np.zeros(len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)
        
    return array_matrix[rows, columns]
like image 1
jessebmurray Avatar answered Oct 20 '22 13:10

jessebmurray