Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy: Selecting n points every m points

Tags:

python

numpy

If I have a numpy.ndarray that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that?

In other words: I want the first 10 points, then skip 20, then grab 10 more, and then skip 10... until the end of the array.

like image 220
questionable_code Avatar asked Aug 01 '18 18:08

questionable_code


People also ask

What does .all do in Numpy?

all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.

How do I select part of an array in Numpy?

To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.

What does [: :] mean on Numpy arrays?

The [:, :] stands for everything from the beginning to the end just like for lists. The difference is that the first : stands for first and the second : for the second dimension. a = numpy. zeros((3, 3)) In [132]: a Out[132]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])


2 Answers

To select 10 elements off each block of 30 elements, we can simply reshape into 2D and slice out the first 10 columns from each row -

a.reshape(-1,30)[:,:10]

The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. Let's have a sample run to show and prove those -

In [43]: np.random.seed(0)

In [44]: a = np.random.randint(0,9,(1,300))
    
In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True

If you need a flattened version, use .ravel() -

a.reshape(-1,30)[:,:10].ravel()

Timings -

In [38]: a = np.random.randint(0,9,(300))

# @sacul's soln
In [39]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100000 loops, best of 3: 7.6 µs per loop

# From this post
In [40]: %timeit a.reshape(-1,30)[:,:10].ravel()
1000000 loops, best of 3: 1.07 µs per loop

In [41]: a = np.random.randint(0,9,(3000000))

# @sacul's soln
In [42]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100 loops, best of 3: 3.66 ms per loop

# From this post
In [43]: %timeit a.reshape(-1,30)[:,:10].ravel()
100 loops, best of 3: 2.32 ms per loop

# If you are okay with `2D` output, it is virtually free
In [44]: %timeit a.reshape(-1,30)[:,:10]
1000000 loops, best of 3: 519 ns per loop

Generic case with 1D array

A. No. of elements being multiple of block length

For a 1D array a with number of elements being a multiple of n, to select m elements off each block of n elements and get a 1D array output, we would have :

a.reshape(-1,n)[:,:m].ravel()

Note that ravel() flattening part makes a copy there. So, if possible keep the unflattened 2D version for memory efficiency.

Sample run -

In [59]: m,n = 2,5

In [60]: N = 25

In [61]: a = np.random.randint(0,9,(N))

In [62]: a
Out[62]: 
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
       3, 0, 3])

# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])

B. Generic no. of elements

We would leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -

def skipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

def slice_m_everyn(a, m, n):
    a_slice2D = skipped_view(a,m,n)
    extra = min(m,len(a)-n*(len(a)//n))
    L = m*(len(a)//n) + extra
    return a_slice2D.ravel()[:L]

Note that skipped_view gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy.

Sample run -

In [170]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(16))

In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])

# Select 2 elements off each block of 5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])

In [173]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(19))

In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])

# Select 2 elements off each block of 5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])
like image 57
Divakar Avatar answered Sep 28 '22 12:09

Divakar


You could create a mask and index by the mask, repeated until it reaches the length of your array:

msk = [True] * 10 + [False] * 20

arr[np.tile(msk, len(arr)//len(msk))]

Minimal example:

In an array of 30 values, select 1 element, then skip 2 elements:

>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
       6, 7, 2, 1, 8, 2, 2])

msk = [True] * 1 + [False] * 2

>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])

Explanation:

msk is a boolean mask

>>> msk
[True, False, False]

You can then repeat that mask with np.tile, until it is the same length as your original array (i.e. the length of your array divided by the length of your mask):

>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False], dtype=bool)

Then it's a simple matter of indexing by a boolean, which numpy excels at

like image 24
sacuL Avatar answered Sep 28 '22 12:09

sacuL