Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast ellipsoid(s) intersection algorithm

Let's say I've got 1 million arbitrarily shaped, arbitrarily oriented N-dimensional ellipsoids scattered randomly through N-dimensional space. Given a sub set of ellipsoids, I want to "quickly" determine the set of all ellipsoids that the ellipsoids from the first set intersects.

There's got to be an algorithm for this. What is it? What is it's "O" complexity?

like image 545
JnBrymn Avatar asked Jun 10 '11 00:06

JnBrymn


1 Answers

The "O" complexity suffers from the curse of dimensionality if you're allowing for N-dimensional data. (See this wikipedia article for more about that). I recommend borrowing from physics simulation and dividing this problem into a "broad phase" and a narrow phase:

  • The broad phase conservatively finds drastically smaller set of pairs of potentially overlapping ellipses.
  • The narrow phase trims the set of potentially overlapping pairs of ellipses to those pairs that actually do overlap.

The narrow phase is a straightforward computational geometry problem of testing for intersection between arbitrary ellipses. For the broad phase you'll want to use a spatial structure such as a spatial hash, spatial tree (R-tree, Kd-tree, X-tree, UB-tree, etc...), or ad-hoc structure given some special properties of the data you're loading (such as an unbalanced tree or hash).

The current popular method is a Kd-tree. There's a lot of documentation and already coded versions of the Kd-tree that are readily configurable, so I recommend you look online. (Google is your friend on this one.) The advantage of using most tree structures is that if set you're looking for intersections with is relatively compact you can search through the tree only once and find the intersections without having to perform multiple tree traversals. This will help with cache (be it from main memory or disk) access patterns. The same algorithm can handle disparate membered queries. It's likely you're doing work that would benefit greatly from the compact query set properties, however.

A Kd-tree won't fix your problems for all ellipsoids -- for example, if you have an ellipsoid of dimension N whose primary axis is from (0, 0, 0, 0, ...) through (1, 1, 1, 1, ...) but with small or inconsequential secondary axes (and henceforth doesn't intersect much) will still need to be an a node that covers [0,1] in all N dimensions. If your ellipsoids fall in [0,1]^n, then every ellipsoid will test for intersection with the aforementioned inconvenient ellipsoid. However, with real world data (and even most synthetic unless you're really trying hard to make Kd-trees slow) the Kd-tree approach should be a win.

If you expect Kd-tree to be a success for thousand-dimension ellipsoids, chances are you're better off with a brute force search. (The aforementioned curse of dimensionality.) However...

One million entries isn't too bad if you've got an optimized implementation, but if you're doing a lot of queries (millions) it's going to be slow (on the order of 10 seconds or worse). I've seen some amazing numbers come out of well-optimized vectorized code, however. (Even shipped some products using this strategy.) With the right cache coherency, brute-forcing would take only milliseconds at most. This means either ASM or vector intrinsics in C/C++ -- not sure which language you're working in.

For most data, O complexity (disregarding the curse of dimensionality) should be about amortized O(m log n) for queries (once the tree is built) where m is the number of ellipses in the query set and n is the number of ellipses in the data set. Building the data itself shouldn't be worse than O(n log n). Multiply everything by Exp(d) where d is dimension -- that's the way it goes with this sort of thing.

like image 118
Kaganar Avatar answered Oct 28 '22 16:10

Kaganar