For a large set of randomly distributed points in a 2D lattice, I want to efficiently extract a subarray, which contains only the elements that, approximated as indices, are assigned to non-zero values in a separate 2D binary matrix. Currently, my script is the following:
lat_len = 100 # lattice length
input = np.random.random(size=(1000,2)) * lat_len
binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1)
def landed(input):
output = []
input_as_indices = np.floor(input)
for i in range(len(input)):
if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1:
output.append(input[i])
output = np.asarray(output)
return output
However, I suspect there must be a better way of doing this. The above script can take quite long to run for 10000 iterations.
Array Type Manipulation in C++ The array is a data structure in c++ that stored multiple data elements of the same data type in continuous memory locations. In c++ programming language, there are inbuilt functions to manipulate array types. Some functions can also be applied to multidimensional arrays.
NumPy is a powerful foundational library in Python and can be used to perform a wide variety of mathematical operations on arrays. It guarantees efficient calculations and offers high-level functions that operate on arrays and matrices.
Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.
You are correct. The calculation above, can be be done more efficiently without a for loop in python using advanced numpy indexing,
def landed2(input):
idx = np.floor(input).astype(np.int)
mask = binary_matrix[idx[:,0], idx[:,1]] == 1
return input[mask]
res1 = landed(input)
res2 = landed2(input)
np.testing.assert_allclose(res1, res2)
this results in a ~150x speed-up.
It seems you can squeeze in a noticeable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth's answer, but using linear indexing -
# Get floor-ed indices
idx = np.floor(input).astype(np.int)
# Calculate linear indices
lin_idx = idx[:,0]*lat_len + idx[:,1]
# Index raveled/flattened version of binary_matrix with lin_idx
# to extract and form the desired output
out = input[binary_matrix.ravel()[lin_idx] ==1]
Thus, in short we have:
out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
Runtime tests -
This section compares the proposed approach in this solution against the other solution that uses row-column indexing.
Case #1(Original datasizes):
In [62]: lat_len = 100 # lattice length
...: input = np.random.random(size=(1000,2)) * lat_len
...: binary_matrix = np.random.choice(2, lat_len * lat_len).
reshape(lat_len, -1)
...:
In [63]: idx = np.floor(input).astype(np.int)
In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
10000 loops, best of 3: 121 µs per loop
In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
10000 loops, best of 3: 103 µs per loop
Case #2(Larger datasizes):
In [75]: lat_len = 1000 # lattice length
...: input = np.random.random(size=(100000,2)) * lat_len
...: binary_matrix = np.random.choice(2, lat_len * lat_len).
reshape(lat_len, -1)
...:
In [76]: idx = np.floor(input).astype(np.int)
In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
100 loops, best of 3: 18.5 ms per loop
In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
100 loops, best of 3: 13.1 ms per loop
Thus, the performance boost with this linear indexing seems to be about 20%
- 30%
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With