I have a 2D <code>numpy.ndarray</code>. Given a list of positions, I want to find the positions of first non-zero elements to the right of the given elements in the same row. Is it possible to vectorize this? I have a huge array and looping is taking too much time. Eg: <pre class="prettyprint"><code>matrix = numpy.array([ [1, 0, 0, 1, 1], [1, 1, 0, 0, 1], [1, 0, 0, 0, 1], [1, 1, 1, 1, 1], [1, 0, 0, 0, 1] ]) query = numpy.array([[0,2], [2,1], [1,3], [0,1]]) </code></pre> Expected Result: <pre class="prettyprint"><code>>> [[0,3], [2,4], [1,4], [0,3]] </code></pre> Currently I'm doing this using for loops as follows <pre class="prettyprint"><code>for query_point in query: y, x = query_point result_point = numpy.min(numpy.argwhere(self.matrix[y, x + 1:] == 1)) + x + 1 print(f'{y}, {result_point}') </code></pre> PS: I also want to find the first non-zero element to the left as well. I guess, the solution to find the right point can be easily tqeaked to find the left point.

If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as <code>matrix</code> that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying <code>query</code> to this index array, which numpy supports directly. It is actually much easier to find the left index, so let's start with that. We can transform <code>matrix</code> into an array of indices like this: <pre class="prettyprint"><code>r, c = np.nonzero(matrix) left_ind = np.zeros(matrix.shape, dtype=int) left_ind[r, c] = c </code></pre> Now you can find the indices of the preceding nonzero element by using <code>np.maximum</code> similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730: <pre class="prettyprint"><code>np.maximum.accumulate(left_ind, axis=1, out=left_ind) </code></pre> Now you can index directly into <code>ind</code> to get the previous nonzero column index: <pre class="prettyprint"><code>left_ind[query[:, 0], query[:, 1]] </code></pre> or <pre class="prettyprint"><code>left_ind[tuple(query.T)] </code></pre> Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices: <pre class="prettyprint"><code>right_ind = np.zeros(matrix.shape, dtype=int) right_ind[r, c] = matrix.shape[1] - c </code></pre> You can use any number larger than <code>matrix.shape[1]</code> as your constant as well. The important thing is that the reversed indices all come out greater than zero so <code>np.maximum.accumulate</code> overwrites the zeros. Now you can use <code>np.maximum.accumulate</code> in the same way on the reversed array: <pre class="prettyprint"><code>right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1] </code></pre> In this case, I would recommend against using <code>out=right_ind</code>, since <code>right_ind[:, ::-1]</code> is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally. Now you can index the array in the same way as before: <pre class="prettyprint"><code>right_ind[(*query.T,)] </code></pre> In both cases, you need to stack with the first column of <code>query</code>, since that's the row key: <pre class="prettyprint"><code>>>> row, col = query.T >>> np.stack((row, left_ind[row, col]), -1) array([[0, 0], [2, 0], [1, 1], [0, 0]]) >>> np.stack((row, right_ind[row, col]), -1) array([[0, 3], [2, 4], [1, 4], [0, 3]]) >>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1) array([[0, 0, 3], [2, 0, 4], [1, 1, 4], [0, 0, 3]]) </code></pre> If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.

Find the index of first non-zero element to the right of given elements in python

Tags:

python

vectorization

numpy

numpy-ndarray

I have a 2D numpy.ndarray. Given a list of positions, I want to find the positions of first non-zero elements to the right of the given elements in the same row. Is it possible to vectorize this? I have a huge array and looping is taking too much time.

Eg:

matrix = numpy.array([
    [1, 0, 0, 1, 1], 
    [1, 1, 0, 0, 1], 
    [1, 0, 0, 0, 1], 
    [1, 1, 1, 1, 1], 
    [1, 0, 0, 0, 1]
])
query = numpy.array([[0,2], [2,1], [1,3], [0,1]])

Expected Result:

>> [[0,3], [2,4], [1,4], [0,3]]

Currently I'm doing this using for loops as follows

for query_point in query:
    y, x = query_point
    result_point = numpy.min(numpy.argwhere(self.matrix[y, x + 1:] == 1)) + x + 1
    print(f'{y}, {result_point}')

PS: I also want to find the first non-zero element to the left as well. I guess, the solution to find the right point can be easily tqeaked to find the left point.

781

asked Mar 04 '21 17:03

Nagabhushan S N

1 Answers

If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as matrix that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying query to this index array, which numpy supports directly.

It is actually much easier to find the left index, so let's start with that. We can transform matrix into an array of indices like this:

r, c = np.nonzero(matrix)
left_ind = np.zeros(matrix.shape, dtype=int)
left_ind[r, c] = c

Now you can find the indices of the preceding nonzero element by using np.maximum similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730:

np.maximum.accumulate(left_ind, axis=1, out=left_ind)

Now you can index directly into ind to get the previous nonzero column index:

left_ind[query[:, 0], query[:, 1]]

left_ind[tuple(query.T)]

Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices:

right_ind = np.zeros(matrix.shape, dtype=int)
right_ind[r, c] = matrix.shape[1] - c

You can use any number larger than matrix.shape[1] as your constant as well. The important thing is that the reversed indices all come out greater than zero so np.maximum.accumulate overwrites the zeros. Now you can use np.maximum.accumulate in the same way on the reversed array:

right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1]

In this case, I would recommend against using out=right_ind, since right_ind[:, ::-1] is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally.

Now you can index the array in the same way as before:

right_ind[(*query.T,)]

In both cases, you need to stack with the first column of query, since that's the row key:

>>> row, col = query.T
>>> np.stack((row, left_ind[row, col]), -1)
array([[0, 0],
       [2, 0],
       [1, 1],
       [0, 0]])
>>> np.stack((row, right_ind[row, col]), -1)
array([[0, 3],
       [2, 4],
       [1, 4],
       [0, 3]])
>>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1)
array([[0, 0, 3],
       [2, 0, 4],
       [1, 1, 4],
       [0, 0, 3]])

If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.

110

answered Nov 14 '22 21:11

Mad Physicist

Related questions
                            
                                Dropdown menu for Plotly Choropleth Map Plots
                            
                                Eigenvectors are complex but only for large matrices
                            
                                Run python script in jenkins
                            
                                Change Logdir of Ray RLlib Training instead of ~/ray_results
                            
                                How does Poetry work regarding binary dependencies? (esp. numpy)
                            
                                What this error means: `y` argument is not supported when using python generator as input
                            
                                Tkinter and 32-bit Unicode duplicating – any fix?
                            
                                Gunicorn: Failed to find attribute 'app' in 'wsgi' when attempting to start flask server
                            
                                Xarray combine_by_coords return the monotonic global index error
                            
                                how to change the python version from default 3.5 to 3.8 of google colab
                            
                                Does imblearn pipeline turn off sampling for testing?
                            
                                How to setup two PyPI indices
                            
                                "ObjectId' object is not iterable" error, while fetching data from MongoDB Atlas
                            
                                Matplotlib doesn't save image in fullscreen
                            
                                Sampling a fixed length sequence from a numpy array
                            
                                “SSL: CERTIFICATE_VERIFY_FAILED” Error when publish MQTT, AWS IoT
                            
                                The command, pip install --upgrade pip, install all version of pip
                            
                                How is Python's iterator unpacking (star unpacking) implemented (or, what magic methods are involved in unpacking a custom iterator?)
                            
                                ERROR: Could not build wheels for pymssql which use PEP 517 and cannot be installed directly
                            
                                How to create a correct pie chart with manim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With