I know how to delete every 4th element in a numpy array:
frame = np.delete(frame,np.arange(4,frame.size,4))
Now I want to know if there is a simple command that can delete every nth (e.g 4) times 3 values.
A basic example:
input: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20....]
Would lead to:
output: [1,2,3,7,8,9,13,14,15,19,20,....]
I was hoping for a simple numpy / python functionality, rather than writing a function that has to iterate over the vector (cause it is quite long in my case,...).
Thank you for your help
A method using boolean indexing:
def block_delete(a, n, m): #keep n, remove m
mask = np.tile(np.r_[np.ones(n), np.zeros(m)].astype(bool), a.size // (n + m) + 1)[:a.size]
return a[mask]
Compare with @Divakar:
def mod_delete(a, n, m):
return a[np.mod(np.arange(a.size), n + m) < n]
a = np.arange(19) + 1
%timeit block_delete(a, 3, 4)
10000 loops, best of 3: 50.6 µs per loop
%timeit mod_delete(a, 3, 4)
The slowest run took 9.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.69 µs per loop
Let's try a longer array:
a = np.arange(999) + 1
%timeit block_delete(a, 3, 4)
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 54.8 µs per loop
%timeit mod_delete(a, 3, 4)
The slowest run took 5.13 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 14.5 µs per loop
Still longer:
a = np.arange(999999) + 1
%timeit block_delete(a, 3, 4)
100 loops, best of 3: 3.93 ms per loop
%timeit mod_delete(a, 3, 4)
100 loops, best of 3: 12.3 ms per loop
So which is faster will depend on the size of your array
Approach #1 : Here's one approach with modulus
and boolean-indexing
-
a[np.mod(np.arange(a.size),6)<3]
As a function, it would translate to :
def select_in_groups(a, M, N): # Keep first M, delete next N and so on.
return a[np.mod(np.arange(a.size),M+N)<M]
Sample step-by-step run -
# Input array
In [361]: a
Out[361]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20])
# Create a range array that spans along the length of array
In [362]: np.arange(a.size)
Out[362]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
# Use modulus to create "intervaled" version of it that shifts at
# the end of each group of 6 elements
In [363]: np.mod(np.arange(a.size),6)
Out[363]: array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1])
# We need to select the first three as valid ones, so compare against 3
# creating a boolean array or mask
In [364]: np.mod(np.arange(a.size),6) < 3
Out[364]:
array([ True, True, True, False, False, False, True, True, True,
False, False, False, True, True, True, False, False, False,
True, True], dtype=bool)
# Use the mask to select valid elements off array
In [365]: a[np.mod(np.arange(a.size),6)<3]
Out[365]: array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20])
Approach #2 : For performance, here's another method with NumPy array strides
-
def select_in_groups_strided(a, M, N): # Keep first M, delete next N and so on.
K = M+N
na = a.size
nrows = (1+((na-1)//K))
n = a.strides[0]
out = np.lib.index_tricks.as_strided(a, shape=(nrows,K), strides=(K*n,n))
N = M*(na//K) + (na - (K*(na//K)))
return out[:,:M].ravel()[:N]
Sample runs -
In [545]: a = np.arange(1,21)
In [546]: a
Out[546]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20])
In [547]: select_in_groups_strided(a,3,3)
Out[547]: array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20])
In [548]: a = np.arange(1,25)
In [549]: a
Out[549]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24])
In [550]: select_in_groups_strided(a,3,3)
Out[550]: array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21])
Runtime test
Using the same setup as in @Daniel Forsman's timing tests
-
In [637]: a = np.arange(1,21)
In [638]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 21 µs per loop
In [639]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 6.44 µs per loop
In [640]: a = np.arange(1,2100)
In [641]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 27 µs per loop
In [642]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 9.1 µs per loop
In [643]: a = np.arange(999999) + 1
In [644]: %timeit block_delete(a,3,3)
100 loops, best of 3: 2.24 ms per loop
In [645]: %timeit select_in_groups_strided(a,3,3)
1000 loops, best of 3: 1.12 ms per loop
Strided
one scales pretty well across varying sizes, if you are considering performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With