Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast array manipulation based on element inclusion in binary matrix

For a large set of randomly distributed points in a 2D lattice, I want to efficiently extract a subarray, which contains only the elements that, approximated as indices, are assigned to non-zero values in a separate 2D binary matrix. Currently, my script is the following:

lat_len = 100 # lattice length
input = np.random.random(size=(1000,2)) * lat_len
binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1)

def landed(input):
    output = []
    input_as_indices = np.floor(input)
    for i in range(len(input)):
        if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1:
            output.append(input[i])
    output = np.asarray(output)
    return output   

However, I suspect there must be a better way of doing this. The above script can take quite long to run for 10000 iterations.

like image 316
neither-nor Avatar asked Jun 21 '15 18:06

neither-nor


People also ask

What is manipulation array?

Array Type Manipulation in C++ The array is a data structure in c++ that stored multiple data elements of the same data type in continuous memory locations. In c++ programming language, there are inbuilt functions to manipulate array types. Some functions can also be applied to multidimensional arrays.

Which package is used for array manipulation in Python?

NumPy is a powerful foundational library in Python and can be used to perform a wide variety of mathematical operations on arrays. It guarantees efficient calculations and offers high-level functions that operate on arrays and matrices.

What is array manipulation in NumPy?

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.


2 Answers

You are correct. The calculation above, can be be done more efficiently without a for loop in python using advanced numpy indexing,

def landed2(input):
    idx = np.floor(input).astype(np.int)
    mask = binary_matrix[idx[:,0], idx[:,1]] == 1
    return input[mask]

res1 = landed(input)
res2 = landed2(input)
np.testing.assert_allclose(res1, res2)

this results in a ~150x speed-up.

like image 184
rth Avatar answered Sep 23 '22 02:09

rth


It seems you can squeeze in a noticeable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth's answer, but using linear indexing -

# Get floor-ed indices
idx = np.floor(input).astype(np.int)

# Calculate linear indices 
lin_idx = idx[:,0]*lat_len + idx[:,1]

# Index raveled/flattened version of binary_matrix with lin_idx
# to extract and form the desired output
out = input[binary_matrix.ravel()[lin_idx] ==1]

Thus, in short we have:

out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]

Runtime tests -

This section compares the proposed approach in this solution against the other solution that uses row-column indexing.

Case #1(Original datasizes):

In [62]: lat_len = 100 # lattice length
    ...: input = np.random.random(size=(1000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [63]: idx = np.floor(input).astype(np.int)

In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
10000 loops, best of 3: 121 µs per loop

In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
10000 loops, best of 3: 103 µs per loop

Case #2(Larger datasizes):

In [75]: lat_len = 1000 # lattice length
    ...: input = np.random.random(size=(100000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [76]: idx = np.floor(input).astype(np.int)

In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
100 loops, best of 3: 18.5 ms per loop

In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
100 loops, best of 3: 13.1 ms per loop

Thus, the performance boost with this linear indexing seems to be about 20% - 30%.

like image 36
Divakar Avatar answered Sep 27 '22 02:09

Divakar