If I have a numpy.ndarray
that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that?
In other words: I want the first 10 points, then skip 20, then grab 10 more, and then skip 10... until the end of the array.
all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.
To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.
The [:, :] stands for everything from the beginning to the end just like for lists. The difference is that the first : stands for first and the second : for the second dimension. a = numpy. zeros((3, 3)) In [132]: a Out[132]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])
To select 10
elements off each block of 30
elements, we can simply reshape into 2D
and slice out the first 10
columns from each row -
a.reshape(-1,30)[:,:10]
The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. Let's have a sample run to show and prove those -
In [43]: np.random.seed(0)
In [44]: a = np.random.randint(0,9,(1,300))
In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True
If you need a flattened version, use .ravel()
-
a.reshape(-1,30)[:,:10].ravel()
Timings -
In [38]: a = np.random.randint(0,9,(300))
# @sacul's soln
In [39]: %%timeit
...: msk = [True] * 10 + [False] * 20
...: out = a[np.tile(msk, len(a)//len(msk))]
100000 loops, best of 3: 7.6 µs per loop
# From this post
In [40]: %timeit a.reshape(-1,30)[:,:10].ravel()
1000000 loops, best of 3: 1.07 µs per loop
In [41]: a = np.random.randint(0,9,(3000000))
# @sacul's soln
In [42]: %%timeit
...: msk = [True] * 10 + [False] * 20
...: out = a[np.tile(msk, len(a)//len(msk))]
100 loops, best of 3: 3.66 ms per loop
# From this post
In [43]: %timeit a.reshape(-1,30)[:,:10].ravel()
100 loops, best of 3: 2.32 ms per loop
# If you are okay with `2D` output, it is virtually free
In [44]: %timeit a.reshape(-1,30)[:,:10]
1000000 loops, best of 3: 519 ns per loop
1D
arrayA. No. of elements being multiple of block length
For a 1D
array a
with number of elements being a multiple of n
, to select m
elements off each block of n
elements and get a 1D
array output, we would have :
a.reshape(-1,n)[:,:m].ravel()
Note that ravel()
flattening part makes a copy there. So, if possible keep the unflattened 2D
version for memory efficiency.
Sample run -
In [59]: m,n = 2,5
In [60]: N = 25
In [61]: a = np.random.randint(0,9,(N))
In [62]: a
Out[62]:
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
3, 0, 3])
# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])
B. Generic no. of elements
We would leverage np.lib.stride_tricks.as_strided
, inspired by this post
to select m
elements off each block of n
elements -
def skipped_view(a, m, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
shp = ((a.size+n-1)//n,n)
return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]
def slice_m_everyn(a, m, n):
a_slice2D = skipped_view(a,m,n)
extra = min(m,len(a)-n*(len(a)//n))
L = m*(len(a)//n) + extra
return a_slice2D.ravel()[:L]
Note that skipped_view
gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy.
Sample run -
In [170]: np.random.seed(0)
...: a = np.random.randint(0,9,(16))
In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])
# Select 2 elements off each block of 5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])
In [173]: np.random.seed(0)
...: a = np.random.randint(0,9,(19))
In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])
# Select 2 elements off each block of 5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])
You could create a mask and index by the mask, repeated until it reaches the length of your array:
msk = [True] * 10 + [False] * 20
arr[np.tile(msk, len(arr)//len(msk))]
Minimal example:
In an array of 30 values, select 1 element, then skip 2 elements:
>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
6, 7, 2, 1, 8, 2, 2])
msk = [True] * 1 + [False] * 2
>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])
Explanation:
msk
is a boolean mask
>>> msk
[True, False, False]
You can then repeat that mask with np.tile
, until it is the same length as your original array (i.e. the length of your array divided by the length of your mask):
>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False, True, False, False, True, False, False,
True, False, False, True, False, False, True, False, False,
True, False, False, True, False, False, True, False, False,
True, False, False], dtype=bool)
Then it's a simple matter of indexing by a boolean, which numpy
excels at
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With