Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consecutive, Overlapping Subsets of Array (NumPy, Python)

I have a NumPy array [1,2,3,4,5,6,7,8,9,10,11,12,13,14] and want to have an array structured like [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]].

Sure this is possible by looping over the large array and adding arrays of length four to the new array, but I'm curious if there is some secret 'magic' Python method doing just this :)

like image 851
Sney Avatar asked Mar 21 '10 02:03

Sney


2 Answers

You should use stride_tricks. When I first saw this, the word 'magic' did spring to mind. It's simple and is by far the fastest method.

>>> as_strided = numpy.lib.stride_tricks.as_strided
>>> a = numpy.arange(1,15)
>>> a
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> b = as_strided(a, (11,4), a.strides*2)
>>> b
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

Be aware that the values in array b are those in a, just viewed differently. Do a .copy() on b if you plan to modify it.

I saw this at a SciPy conference. Here are the slides for more explanation.

like image 176
Paul Avatar answered Oct 30 '22 05:10

Paul


The fastest way seems to be to preallocate the array, given as option 7 right at the bottom of this answer.

>>> import numpy as np
>>> A=np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14])
>>> A
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> np.array(zip(A,A[1:],A[2:],A[3:]))
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])
>>> 

You can easily adapt this to do it for variable chunk size.

>>> n=5
>>> np.array(zip(*(A[i:] for i in range(n))))
array([[ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13],
       [10, 11, 12, 13, 14]])

You may wish to compare performance between this and using itertools.islice.

>>> from itertools import islice
>>> n=4
>>> np.array(zip(*[islice(A,i,None) for i in range(n)]))
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

My timing results:

1. timeit np.array(zip(A,A[1:],A[2:],A[3:]))
10000 loops, best of 3: 92.9 us per loop

2. timeit np.array(zip(*(A[i:] for i in range(4))))
10000 loops, best of 3: 101 us per loop

3. timeit np.array(zip(*[islice(A,i,None) for i in range(4)]))
10000 loops, best of 3: 101 us per loop

4. timeit numpy.array([ A[i:i+4] for i in range(len(A)-3) ])
10000 loops, best of 3: 37.8 us per loop

5. timeit numpy.array(list(chunks(A, 4)))
10000 loops, best of 3: 43.2 us per loop

6. timeit numpy.array(byN(A, 4))
10000 loops, best of 3: 100 us per loop

# Does preallocation of the array help? (11 is from len(A)+1-4)
7. timeit B=np.zeros(shape=(11, 4),dtype=np.int32)
100000 loops, best of 3: 2.19 us per loop
   timeit for i in range(4):B[:,i]=A[i:11+i]
10000 loops, best of 3: 20.9 us per loop
total 23.1us per loop

As len(A) increases (20000) 4 and 5 converge to be equivalent speed (44 ms). 1,2,3 and 6 all remain about 3 times slower (135 ms). 7 is much faster (1.36 ms).

like image 28
John La Rooy Avatar answered Oct 30 '22 07:10

John La Rooy