Fast array manipulation based on element inclusion in binary matrix

Tags:

For a large set of randomly distributed points in a 2D lattice, I want to efficiently extract a subarray, which contains only the elements that, approximated as indices, are assigned to non-zero values in a separate 2D binary matrix. Currently, my script is the following:

lat_len = 100 # lattice length
input = np.random.random(size=(1000,2)) * lat_len
binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1)

def landed(input):
    output = []
    input_as_indices = np.floor(input)
    for i in range(len(input)):
        if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1:
            output.append(input[i])
    output = np.asarray(output)
    return output

However, I suspect there must be a better way of doing this. The above script can take quite long to run for 10000 iterations.

316

asked Jun 21 '15 18:06

neither-nor

2 Answers

You are correct. The calculation above, can be be done more efficiently without a for loop in python using advanced numpy indexing,

def landed2(input):
    idx = np.floor(input).astype(np.int)
    mask = binary_matrix[idx[:,0], idx[:,1]] == 1
    return input[mask]

res1 = landed(input)
res2 = landed2(input)
np.testing.assert_allclose(res1, res2)

this results in a ~150x speed-up.

184

answered Sep 23 '22 02:09

rth

It seems you can squeeze in a noticeable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth's answer, but using linear indexing -

# Get floor-ed indices
idx = np.floor(input).astype(np.int)

# Calculate linear indices 
lin_idx = idx[:,0]*lat_len + idx[:,1]

# Index raveled/flattened version of binary_matrix with lin_idx
# to extract and form the desired output
out = input[binary_matrix.ravel()[lin_idx] ==1]

Thus, in short we have:

out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]

Runtime tests -

This section compares the proposed approach in this solution against the other solution that uses row-column indexing.

Case #1(Original datasizes):

In [62]: lat_len = 100 # lattice length
    ...: input = np.random.random(size=(1000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [63]: idx = np.floor(input).astype(np.int)

In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
10000 loops, best of 3: 121 µs per loop

In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
10000 loops, best of 3: 103 µs per loop

Case #2(Larger datasizes):

In [75]: lat_len = 1000 # lattice length
    ...: input = np.random.random(size=(100000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [76]: idx = np.floor(input).astype(np.int)

In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
100 loops, best of 3: 18.5 ms per loop

In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
100 loops, best of 3: 13.1 ms per loop

Thus, the performance boost with this linear indexing seems to be about 20% - 30%.

answered Sep 27 '22 02:09

Divakar

Related questions
                            
                                Python: regex findall
                            
                                paramiko/ Socket closes after returning console object from function
                            
                                How to install a python egg
                            
                                How to do case insensitive sort of a dictionary and store them in OrderedDict
                            
                                How to add "order within group" column in pandas?
                            
                                ManyToManyField does not save in django
                            
                                Create new ttk widget from tkinter
                            
                                Python 2D Gaussian Fit with NaN Values in Data
                            
                                sqlalchemy MSSQL create database statement not allowed within multi-statement transaction
                            
                                connect to Azure SQL database via pyodbc
                            
                                Plot into Pycharm's IPython console
                            
                                Run Lua script from Python
                            
                                Errno 22:invalid mode('rb') or filename:' ' while running a spec file while using pyinstaller
                            
                                Completely remove one index label from a multiindex, in a dataframe
                            
                                How do you specify the foreign key column value in SQLAlchemy?
                            
                                How to get the real cube root of a negative number in Python3?
                            
                                Apache Spark: Error while starting PySpark
                            
                                Fourier transform in scipy and Parseval's identity
                            
                                double recursive education example
                            
                                Selenium - get all children divs but not grandchildren

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast array manipulation based on element inclusion in binary matrix

Tags:

performance

python

arrays

numpy

neither-nor

People also ask

2 Answers

rth

Divakar

Recent Activity

Donate For Us