I have an interesting problem but cannot find the relevant literature, so need algorithem suggestions or just the correct way to phrase the problem so I can find the literature.
I'm given a vector of floats of varying length coming from a 2D array. I know the orientation (i.e. which axis I'm looking in). I need to find the start index of the vector. Essentially, I'm cross-correlating a short vector with a long vector.
I have implemented a brute force approach but naturally it grows as O(n^2) where n is the dimension of the 2D array.
Q1) What is an efficient algorithem to tackel this problem?
Q2) How is this type of problem called (so I can find relevant papers)?
There is measurement error so it will never be an exact match.
Here the brute force approach, where I look for the minimum of the norm of two vectors:
import time
import numpy as np
def get_distance(a: np.ndarray, b: np.ndarray) -> float:
""" Calculate the distance between two vectors. """
return np.linalg.norm(a - b)
def find_min_distance_subvector(array: np.ndarray, x: np.ndarray) -> tuple[int, float]:
leng = len(x)
min_index = 0
min_distance = np.inf
# Assuming we know the orientation of the vector
for ii in range(0, len(array) - leng):
# Extract a sub-vector of the same length as x, starting from index ii
sub_vec = array[ii:ii + leng]
dist = get_distance(sub_vec, x)
if dist < min_distance:
min_distance = dist
min_index = ii
return min_index, min_distance
def main():
leng = 100
size = 2000
# Create the search map
arr = np.random.random((size, size))
# Pick a random sub-vector from the map
index = np.random.randint(0, size - leng)
x = arr[index:index + leng]
start_time = time.time()
min_index, min_metric = find_min_distance_subvector(arr, x)
end_time = time.time()
print(f"Minimum distance: {min_metric} at index {min_index}, correct index: {index}, "
f"time taken: {end_time - start_time:.4f} seconds")
if __name__ == '__main__':
main()
Thank you for your help
Since you know the axis you're working on, let us reduce the problem to a single dimension: you want to find a subsequence matching some needle query="abcd" (or the closest thing to a match) inside a big heystack: corpus="azmpqrfksuxbabcdrtwerl". In other words, in a single dimension this problem can be reduced to a string search problem, with all the known algorithms for solving it.
However, if you're not certain your needle is indeed in the heystack and you want to choose instead some minimum distance subsequence instead of returning a False value (as your code might suggest), you have a different problem on your hands and I'm not aware of better than naive solutions for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With