Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a series of elements every nth time in numpy array

Tags:

I know how to delete every 4th element in a numpy array:

frame = np.delete(frame,np.arange(4,frame.size,4))

Now I want to know if there is a simple command that can delete every nth (e.g 4) times 3 values.

A basic example:

input: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20....]

Would lead to:

output: [1,2,3,7,8,9,13,14,15,19,20,....]

I was hoping for a simple numpy / python functionality, rather than writing a function that has to iterate over the vector (cause it is quite long in my case,...).

Thank you for your help

like image 963
Kev1n91 Avatar asked Apr 11 '17 11:04

Kev1n91


2 Answers

A method using boolean indexing:

def block_delete(a, n, m):  #keep n, remove m
    mask = np.tile(np.r_[np.ones(n), np.zeros(m)].astype(bool), a.size // (n + m) + 1)[:a.size]
    return a[mask]

Compare with @Divakar:

def mod_delete(a, n, m):
    return a[np.mod(np.arange(a.size), n + m) < n]

a = np.arange(19) + 1

%timeit block_delete(a, 3, 4)
10000 loops, best of 3: 50.6 µs per loop

%timeit mod_delete(a, 3, 4)
The slowest run took 9.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.69 µs per loop

Let's try a longer array:

a = np.arange(999) + 1

%timeit block_delete(a, 3, 4)
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 54.8 µs per loop

%timeit mod_delete(a, 3, 4)
The slowest run took 5.13 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 14.5 µs per loop

Still longer:

a = np.arange(999999) + 1

%timeit block_delete(a, 3, 4)
100 loops, best of 3: 3.93 ms per loop

%timeit mod_delete(a, 3, 4)
100 loops, best of 3: 12.3 ms per loop

So which is faster will depend on the size of your array

like image 92
Daniel F Avatar answered Oct 11 '22 10:10

Daniel F


Approach #1 : Here's one approach with modulus and boolean-indexing -

a[np.mod(np.arange(a.size),6)<3]

As a function, it would translate to :

def select_in_groups(a, M, N): # Keep first M, delete next N and so on.
    return a[np.mod(np.arange(a.size),M+N)<M]

Sample step-by-step run -

# Input array
In [361]: a
Out[361]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

# Create a range array that spans along the length of array
In [362]: np.arange(a.size)
Out[362]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

# Use modulus to create "intervaled" version of it that shifts at
# the end of each group of 6 elements
In [363]: np.mod(np.arange(a.size),6)
Out[363]: array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1])

# We need to select the first three as valid ones, so compare against 3
# creating a boolean array or mask
In [364]: np.mod(np.arange(a.size),6) < 3
Out[364]: 
array([ True,  True,  True, False, False, False,  True,  True,  True,
       False, False, False,  True,  True,  True, False, False, False,
        True,  True], dtype=bool)

# Use the mask to select valid elements off array
In [365]: a[np.mod(np.arange(a.size),6)<3]
Out[365]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20])

Approach #2 : For performance, here's another method with NumPy array strides -

def select_in_groups_strided(a, M, N): # Keep first M, delete next N and so on.
    K = M+N
    na = a.size
    nrows = (1+((na-1)//K))
    n = a.strides[0]
    out = np.lib.index_tricks.as_strided(a, shape=(nrows,K), strides=(K*n,n))
    N = M*(na//K) + (na - (K*(na//K)))
    return out[:,:M].ravel()[:N]

Sample runs -

In [545]: a = np.arange(1,21)

In [546]: a
Out[546]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [547]: select_in_groups_strided(a,3,3)
Out[547]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20])

In [548]: a = np.arange(1,25)

In [549]: a
Out[549]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24])

In [550]: select_in_groups_strided(a,3,3)
Out[550]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20, 21])

Runtime test

Using the same setup as in @Daniel Forsman's timing tests -

In [637]: a = np.arange(1,21)

In [638]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 21 µs per loop

In [639]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 6.44 µs per loop

In [640]: a = np.arange(1,2100)

In [641]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 27 µs per loop

In [642]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 9.1 µs per loop

In [643]: a = np.arange(999999) + 1

In [644]: %timeit block_delete(a,3,3)
100 loops, best of 3: 2.24 ms per loop

In [645]: %timeit select_in_groups_strided(a,3,3)
1000 loops, best of 3: 1.12 ms per loop

Strided one scales pretty well across varying sizes, if you are considering performance.

like image 21
Divakar Avatar answered Oct 11 '22 11:10

Divakar