Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manhattan distance with n dimensions in oracle

I have a table with around 5 millions rows and each row has 10 columns representing 10 dimensions. I would like to be able when a new input is coming to perform a search in the table to return the closest rows using Manhattan distances. The distance is the sum of abs(Ai-Aj)+abs(Bi-Bj)... The problem is that for the moment if I do a query, it does a full scan of the entire table, to calculate the distances from every rows, and then sort them to find the top X.

Is there a way to speed the process and make the query more efficient?

I looked at the distance function online for the SDO_GEOMETRY, but I couldn't find it for more than 4 dimensions.

Thank you

like image 909
user2010422 Avatar asked Nov 03 '22 05:11

user2010422


1 Answers

If you are inserting a point A and you want to look for points that are within a neighbourhood of radius r (i.e., are less than r away, on any metric), you can do a really simply query:

select x1, x2, ..., xn
from   points
where  x1 between a1 - r and a1 + r
and    x2 between a2 - r and a2 + r
...
and    xn between an - r and an + r

...where A = (a1, a2, ..., an), to find a bound. If you have an index over all x1, ..., xn fields of points, then this query shouldn't require a full scan. Now, this result may include points that are outside the neighbourhood (i.e., the bits in the corners), but is an easy win to find an appropriate subset: you can now check against the records in this subquery, rather than checking against every point in your table.

You may be able to refine this query further because, with the Manhattan metric, a neighbourhood will be square shaped (although at 45 degrees to the above) and squares are relatively easy to work with! (Even in 10 dimensions.) However, the more complicated logic required may be more of an overhead than an optimisation, ultimately.

like image 169
Xophmeister Avatar answered Nov 15 '22 06:11

Xophmeister