Let's say I have a NumPy array: <pre class="prettyprint"><code>x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0]) </code></pre> At each index, I want to find the distance to nearest zero value. If the position is a zero itself then return zero as a distance. Afterward, we are only interested in distances to the nearest zero that is to the right of the current position. The super naive approach would be something like: <pre class="prettyprint"><code>out = np.full(x.shape[0], x.shape[0]-1) for i in range(x.shape[0]): j = 0 while i + j < x.shape[0]: if x[i+j] == 0: break j += 1 out[i] = j </code></pre> And the output would be: <pre class="prettyprint"><code>array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0]) </code></pre> I'm noticing a countdown/decrement pattern in the output in between the zeros. So, I might be able to do use the locations of the zeros (i.e., <code>zero_indices = np.argwhere(x == 0).flatten()</code>) What is the fastest way to get the desired output in linear time?

You could work from the other side. Keep a counter on how many non zero digits have passed and assign it to the element in the array. If you see 0, reset the counter to 0 Edit: if there is no zero on the right, then you need another check <pre class="prettyprint"><code>x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0]) out = x count = 0 hasZero = False for i in range(x.shape[0]-1,-1,-1): if out[i] != 0: if not hasZero: out[i] = x.shape[0]-1 else: count += 1 out[i] = count else: hasZero = True count = 0 print(out) </code></pre>

Find Distance to Nearest Zero in NumPy Array

Tags:

python

numpy

Let's say I have a NumPy array:

x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])

At each index, I want to find the distance to nearest zero value. If the position is a zero itself then return zero as a distance. Afterward, we are only interested in distances to the nearest zero that is to the right of the current position. The super naive approach would be something like:

out = np.full(x.shape[0], x.shape[0]-1)
for i in range(x.shape[0]):
    j = 0
    while i + j < x.shape[0]:
        if x[i+j] == 0:
            break
        j += 1
    out[i] = j

And the output would be:

array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0])

I'm noticing a countdown/decrement pattern in the output in between the zeros. So, I might be able to do use the locations of the zeros (i.e., zero_indices = np.argwhere(x == 0).flatten())

What is the fastest way to get the desired output in linear time?

986

asked Mar 05 '20 21:03

slaw

2 Answers

Approach #1 : Searchsorted to the rescue for linear-time in a vectorized manner (before numba guys come in)!

mask_z = x==0
idx_z = np.flatnonzero(mask_z)
idx_nz = np.flatnonzero(~mask_z)

# Cover for the case when there's no 0 left to the right
# (for same results as with posted loop-based solution)
if x[-1]!=0:
    idx_z = np.r_[idx_z,len(x)]

out = np.zeros(len(x), dtype=int)
idx = np.searchsorted(idx_z, idx_nz)
out[~mask_z] = idx_z[idx] - idx_nz

Approach #2 : Another with some cumsum -

mask_z = x==0
idx_z = np.flatnonzero(mask_z)

# Cover for the case when there's no 0 left to the right
if x[-1]!=0:
    idx_z = np.r_[idx_z,len(x)]

out = idx_z[np.r_[False,mask_z[:-1]].cumsum()] - np.arange(len(x))

Alternatively, last step of cumsum could be replaced by repeat functionality -

r = np.r_[idx_z[0]+1,np.diff(idx_z)]
out = np.repeat(idx_z,r)[:len(x)] - np.arange(len(x))

Approach #3 : Another with mostly just cumsum -

mask_z = x==0
idx_z = np.flatnonzero(mask_z)

pp = np.full(len(x), -1)
pp[idx_z[:-1]] = np.diff(idx_z) - 1
if idx_z[0]==0:
    pp[0] = idx_z[1]
else:
    pp[0] = idx_z[0]
out = pp.cumsum()

# Handle boundary case and assigns 0s at original 0s places
out[idx_z[-1]:] = np.arange(len(x)-idx_z[-1],0,-1)
out[mask_z] = 0

179

answered Nov 23 '22 08:11

Divakar

You could work from the other side. Keep a counter on how many non zero digits have passed and assign it to the element in the array. If you see 0, reset the counter to 0

Edit: if there is no zero on the right, then you need another check

x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
out = x 
count = 0 
hasZero = False 
for i in range(x.shape[0]-1,-1,-1):
    if out[i] != 0:
        if not hasZero: 
            out[i] = x.shape[0]-1
        else:
            count += 1
            out[i] = count
    else:
        hasZero = True
        count = 0
print(out)

answered Nov 23 '22 08:11

MT756

Related questions
                            
                                Weird repeated sequence printed to console when installing packages through conda
                            
                                Convert Pandas Dataframe to Float with commas and negative numbers
                            
                                How do I perform an UPDATE of existing rows of a db table using a Pandas DataFrame?
                            
                                How to mock a method return value of a class
                            
                                Python logger per function or per module
                            
                                Is 3-space indentation required in reST?
                            
                                DBSCAN with custom metric
                            
                                Keras LSTM: a time-series multi-step multi-features forecasting - poor results
                            
                                Efficient Way of making a set of tuple in which the order of tuple doesn't matters
                            
                                Multiple instances of Python running simultaneously limited to 35
                            
                                How to Install the latest version of seaborn
                            
                                What is Numpy equivalence of dataframe.loc() in Pandas
                            
                                Keras Concatenate Layers: Difference between different types of concatenate functions
                            
                                How to serve a tensorflow-module, specifically Universal Sentence Encoder?
                            
                                using #!/usr/bin/env python3 shebang with Windows
                            
                                Python - List returning [[...], 6] [duplicate]
                            
                                pytorch grad is None after .backward()
                            
                                how can i create subplots with plotly express?
                            
                                ResourceWarning: unclosed file <_io.BufferedReader name=4>
                            
                                Tensorflow: `batch_size` or `steps` is required for `Tensor` or `NumPy` input data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With