Given a 2D numpy array:
00111100110111
01110011000110
00111110001000
01101101001110
Is there an efficient way to replace runs of 1
which are >= N
in length?
For example, if N=3
00222200110222
02220011000110
00222220001000
01101101002220
In reality the 2D array is binary and I want to replace runs of 1 with 0, but for clarity I have replace them with 2 in the above example.
Runnable Example: http://runnable.com/U6q0q-TFWzxVd_Uf/numpy-replace-runs-for-python
The code I'm currently using looks a bit hacky and I feel like there is probably some magic numpy way of doing this:
UPDATE: I'm aware that I changed the example to a version which didn't handle corner cases. That was a minor implementation bug (now fixed). I was more interested if there was a a faster way of doing it.
import numpy as np
import time
def replace_runs(a, search, run_length, replace = 2):
a_copy = a.copy() # Don't modify original
for i, row in enumerate(a):
runs = []
current_run = []
for j, val in enumerate(row):
if val == search:
current_run.append(j)
else:
if len(current_run) >= run_length or j == len(row) -1:
runs.append(current_run)
current_run = []
if len(current_run) >= run_length or j == len(row) -1:
runs.append(current_run)
for run in runs:
for col in run:
a_copy[i][col] = replace
return a_copy
arr = np.array([
[0,0,1,1,1,1,0,0,1,1,0,1,1,1],
[0,1,1,1,0,0,1,1,0,0,0,1,1,0],
[0,0,1,1,1,1,1,0,0,0,1,0,0,0],
[0,1,1,0,1,1,0,1,0,0,1,1,1,0],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[1,1,1,1,1,1,1,1,1,1,1,1,1,0],
[0,1,1,1,1,1,1,1,1,1,1,1,1,1],
])
print arr
print replace_runs(arr, 1, 3)
iterations = 100000
t0 = time.time()
for i in range(0,iterations):
replace_runs(arr, 1, 3)
t1 = time.time()
print "replace_runs: %d iterations took %.3fs" % (iterations, t1 - t0)
Output:
[[0 0 1 1 1 1 0 0 1 1 0 1 1 1]
[0 1 1 1 0 0 1 1 0 0 0 1 1 0]
[0 0 1 1 1 1 1 0 0 0 1 0 0 0]
[0 1 1 0 1 1 0 1 0 0 1 1 1 0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 0]
[0 1 1 1 1 1 1 1 1 1 1 1 1 1]]
[[0 0 2 2 2 2 0 0 1 1 0 2 2 2]
[0 2 2 2 0 0 1 1 0 0 0 2 2 0]
[0 0 2 2 2 2 2 0 0 0 1 0 0 0]
[0 1 1 0 1 1 0 1 0 0 2 2 2 0]
[2 2 2 2 2 2 2 2 2 2 2 2 2 2]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[2 2 2 2 2 2 2 2 2 2 2 2 2 0]
[0 2 2 2 2 2 2 2 2 2 2 2 2 2]]
replace_runs: 100000 iterations took 14.406s
using a pattern matching through convolution:
def replace_runs(a, N, replace = 2):
a_copy = a.copy()
pattern = np.ones(N, dtype=int)
M = a_copy.shape[1]
for i, row in enumerate(a_copy):
conv = np.convolve(row, pattern, mode='same')
match = np.where(conv==N)
a_copy[i][match]=replace
a_copy[i][match[0][match[0]-1>0]-1]=replace
a_copy[i][match[0][match[0]+1<M]+1]=replace
return a_copy
3 times slower than the original replace_runs
but detect the corner cases (like the proposed string based approach).
On my machine:
replace_runs_org: 100000 iterations took 12.792s
replace_runs_var: 100000 iterations took 33.112s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With