If I have a <code>numpy.ndarray</code> that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that? In other words: I want the first 10 points, then skip 20, then grab 10 more, and then skip 10... until the end of the array.

To select <code>10</code> elements off each block of <code>30</code> elements, we can simply reshape into <code>2D</code> and slice out the first <code>10</code> columns from each row - <pre class="prettyprint"><code>a.reshape(-1,30)[:,:10] </code></pre> The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. Let's have a sample run to show and prove those - <pre class="prettyprint"><code>In [43]: np.random.seed(0) In [44]: a = np.random.randint(0,9,(1,300)) In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10]) Out[48]: True </code></pre> If you need a flattened version, use <code>.ravel()</code> - <pre class="prettyprint"><code>a.reshape(-1,30)[:,:10].ravel() </code></pre> Timings - <pre class="prettyprint"><code>In [38]: a = np.random.randint(0,9,(300)) # @sacul's soln In [39]: %%timeit ...: msk = [True] * 10 + [False] * 20 ...: out = a[np.tile(msk, len(a)//len(msk))] 100000 loops, best of 3: 7.6 µs per loop # From this post In [40]: %timeit a.reshape(-1,30)[:,:10].ravel() 1000000 loops, best of 3: 1.07 µs per loop In [41]: a = np.random.randint(0,9,(3000000)) # @sacul's soln In [42]: %%timeit ...: msk = [True] * 10 + [False] * 20 ...: out = a[np.tile(msk, len(a)//len(msk))] 100 loops, best of 3: 3.66 ms per loop # From this post In [43]: %timeit a.reshape(-1,30)[:,:10].ravel() 100 loops, best of 3: 2.32 ms per loop # If you are okay with `2D` output, it is virtually free In [44]: %timeit a.reshape(-1,30)[:,:10] 1000000 loops, best of 3: 519 ns per loop </code></pre> <hr> <h3>Generic case with <code>1D</code> array</h3> A. No. of elements being multiple of block length For a <code>1D</code> array <code>a</code> with number of elements being a multiple of <code>n</code>, to select <code>m</code> elements off each block of <code>n</code> elements and get a <code>1D</code> array output, we would have : <pre class="prettyprint"><code>a.reshape(-1,n)[:,:m].ravel() </code></pre> Note that <code>ravel()</code> flattening part makes a copy there. So, if possible keep the unflattened <code>2D</code> version for memory efficiency. Sample run - <pre class="prettyprint"><code>In [59]: m,n = 2,5 In [60]: N = 25 In [61]: a = np.random.randint(0,9,(N)) In [62]: a Out[62]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4, 3, 0, 3]) # Select 2 elements off each block of 5 elements In [63]: a.reshape(-1,n)[:,:m].ravel() Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4]) </code></pre> B. Generic no. of elements We would leverage <code>np.lib.stride_tricks.as_strided</code>, inspired by <code>this post</code> to select <code>m</code> elements off each block of <code>n</code> elements - <pre class="prettyprint"><code>def skipped_view(a, m, n): s = a.strides[0] strided = np.lib.stride_tricks.as_strided shp = ((a.size+n-1)//n,n) return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m] def slice_m_everyn(a, m, n): a_slice2D = skipped_view(a,m,n) extra = min(m,len(a)-n*(len(a)//n)) L = m*(len(a)//n) + extra return a_slice2D.ravel()[:L] </code></pre> Note that <code>skipped_view</code> gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy. Sample run - <pre class="prettyprint"><code>In [170]: np.random.seed(0) ...: a = np.random.randint(0,9,(16)) In [171]: a Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7]) # Select 2 elements off each block of 5 elements In [172]: slice_m_everyn(a, m=2, n=5) Out[172]: array([5, 0, 3, 5, 6, 8, 7]) In [173]: np.random.seed(0) ...: a = np.random.randint(0,9,(19)) In [174]: a Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1]) # Select 2 elements off each block of 5 elements In [175]: slice_m_everyn(a, m=2, n=5) Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7]) </code></pre>

You could create a mask and index by the mask, repeated until it reaches the length of your array: <pre class="prettyprint"><code>msk = [True] * 10 + [False] * 20 arr[np.tile(msk, len(arr)//len(msk))] </code></pre> Minimal example: In an array of 30 values, select 1 element, then skip 2 elements: <pre class="prettyprint"><code>>>> arr array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9, 6, 7, 2, 1, 8, 2, 2]) msk = [True] * 1 + [False] * 2 >>> arr[np.tile(msk, len(arr)//len(msk))] array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8]) </code></pre> Explanation: <code>msk</code> is a boolean mask <pre class="prettyprint"><code>>>> msk [True, False, False] </code></pre> You can then repeat that mask with <code>np.tile</code>, until it is the same length as your original array (i.e. the length of your array divided by the length of your mask): <pre class="prettyprint"><code>>>> np.tile(msk, len(arr)//len(msk)) array([ True, False, False, True, False, False, True, False, False, True, False, False, True, False, False, True, False, False, True, False, False, True, False, False, True, False, False, True, False, False], dtype=bool) </code></pre> Then it's a simple matter of indexing by a boolean, which <code>numpy</code> excels at

NumPy: Selecting n points every m points

2 Answers

To select 10 elements off each block of 30 elements, we can simply reshape into 2D and slice out the first 10 columns from each row -

a.reshape(-1,30)[:,:10]

The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. Let's have a sample run to show and prove those -

In [43]: np.random.seed(0)

In [44]: a = np.random.randint(0,9,(1,300))
    
In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True

If you need a flattened version, use .ravel() -

a.reshape(-1,30)[:,:10].ravel()

Timings -

In [38]: a = np.random.randint(0,9,(300))

# @sacul's soln
In [39]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100000 loops, best of 3: 7.6 µs per loop

# From this post
In [40]: %timeit a.reshape(-1,30)[:,:10].ravel()
1000000 loops, best of 3: 1.07 µs per loop

In [41]: a = np.random.randint(0,9,(3000000))

# @sacul's soln
In [42]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100 loops, best of 3: 3.66 ms per loop

# From this post
In [43]: %timeit a.reshape(-1,30)[:,:10].ravel()
100 loops, best of 3: 2.32 ms per loop

# If you are okay with `2D` output, it is virtually free
In [44]: %timeit a.reshape(-1,30)[:,:10]
1000000 loops, best of 3: 519 ns per loop

Generic case with `1D` array

A. No. of elements being multiple of block length

For a 1D array a with number of elements being a multiple of n, to select m elements off each block of n elements and get a 1D array output, we would have :

a.reshape(-1,n)[:,:m].ravel()

Note that ravel() flattening part makes a copy there. So, if possible keep the unflattened 2D version for memory efficiency.

Sample run -

In [59]: m,n = 2,5

In [60]: N = 25

In [61]: a = np.random.randint(0,9,(N))

In [62]: a
Out[62]: 
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
       3, 0, 3])

# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])

B. Generic no. of elements

We would leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -

def skipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

def slice_m_everyn(a, m, n):
    a_slice2D = skipped_view(a,m,n)
    extra = min(m,len(a)-n*(len(a)//n))
    L = m*(len(a)//n) + extra
    return a_slice2D.ravel()[:L]

Note that skipped_view gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy.

Sample run -

In [170]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(16))

In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])

# Select 2 elements off each block of 5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])

In [173]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(19))

In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])

# Select 2 elements off each block of 5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])

answered Sep 28 '22 12:09

Divakar

You could create a mask and index by the mask, repeated until it reaches the length of your array:

msk = [True] * 10 + [False] * 20

arr[np.tile(msk, len(arr)//len(msk))]

Minimal example:

In an array of 30 values, select 1 element, then skip 2 elements:

>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
       6, 7, 2, 1, 8, 2, 2])

msk = [True] * 1 + [False] * 2

>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])

Explanation:

msk is a boolean mask

>>> msk
[True, False, False]

You can then repeat that mask with np.tile, until it is the same length as your original array (i.e. the length of your array divided by the length of your mask):

>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False], dtype=bool)

Then it's a simple matter of indexing by a boolean, which numpy excels at

answered Sep 28 '22 12:09

sacuL

Related questions
                            
                                tf.assign to variable slice doesn't work inside tf.while_loop
                            
                                What does pyspark need psutil for? (faced "UserWarning: Please install psutil to have better support with spilling")?
                            
                                Why does the shape of the selection of my pandas dataframe is wrong
                            
                                AttributeError: module 'plotly' has no attribute 'plotly'
                            
                                plotly basic example shows no plot in jupyter lab
                            
                                Use custom function in ModelViewSet with Django Rest Framework
                            
                                How can I override Django Rest Framework viewset perform_create() method to set a default value to a field
                            
                                Array in a program
                            
                                python - Sum of digits in 2^1000? [duplicate]
                            
                                How to get exactly one element from list in Jinja?
                            
                                PyQt5 - How to display image in QMainWindow class?
                            
                                create a tensor proto whose content is larger than 2GB
                            
                                how to import selenium WebElement in python
                            
                                Why doesn't concurrent.futures.ThreadPoolExecutor().submit return immediately?
                            
                                Why am I getting a KeyError when trying to use an Enum as a dictionary key in another file?
                            
                                Create barplot from string data using groupby and multiple columns in pandas dataframe
                            
                                How can i extract day of week from timestamp in pandas
                            
                                Python Pandas: Merge Columns of Data Frame with column name into one column
                            
                                Python Selenium Right Click
                            
                                "No such file or directory" when using Windows Linux Subsystem bash with VS Code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NumPy: Selecting n points every m points

Tags:

python

numpy

questionable_code

People also ask

2 Answers

Generic case with `1D` array

Divakar

sacuL

Recent Activity

Donate For Us

NumPy: Selecting n points every m points

Tags:

python

numpy

questionable_code

People also ask

2 Answers

Generic case with 1D array

Divakar

sacuL

Related questions

Recent Activity

Donate For Us

Generic case with `1D` array