I know how to find whether a ray rooted at a 2D point of interest (POI) intersects a given 2D segment and how for a given ray find effectively the closest to the POI intersection with multiple segments.
However, I have several (not too many) rays rooted at each of many (and I mean many) POIs and many (and I mean many again) segments, and I need to find the closest to the corresponding POI intersection with segments for each of the rays.
Here's what I have in mind for this thing: treat rays as segments with one side at the corresponding POI and the other side very very far in the right direction, then run sweep line algorithm to find all intersections, then for each ray output the intersection closest to POI. This should run in O(N log N), and therefore be kinda sorta good enough.
However, this part happens to be a bottleneck for the whole system, and I'd like to do better than kinda sorta good enough. In particular, because there really are many objects to sort through I'd like to make this algorithm parallel. Unfortunately, sweep line algorithm seems to be inherently sequential, and living with that would impair overall performance of my otherwise fairly parallel system.
Therefore the question: what would you suggest for effective parallelization of the problem described above?
Also, are there any known highly parallel intersection detection algorithms capable of taking advantage of CUDA level of parallelism?
You could try to do an algorithm based on a quadtree or some other spatially indexed 2d data structure.
Here is the idea:
In each leaf of the quadtree, you store a reference to any segment which intersects that square. To start, you have zero resolution, i.e. all of the segments are in the bounding box node, which is not subdivided at all.
Then, for each ray, look at the origin of the ray and subdivide the node which contains it until the node intersects only one segment. If after a subdivision the node does not intersect any segments, move on to the next node by following the ray. If the node intersects a segment inside the node, we have found the first intersection, and can stop the search along this ray. If the ray does not intersect the segment inside the node, we can prune that segment from the search, and move on to the next node that the ray intersects.
This algorithm can obviously be run in parallel, by performing the search along each ray in parallel. There will still be the shared quadtree, but as it becomes subdivided, very quickly the threads will not compete for access.
Running Time:
Running time is a bit hard to analyze, because it depends on the 2d structure of the segments and rays. Perhaps someone can help do a better job at analyzing the running time, but here's my best shot:
If there is a small distance d, which is smaller than the distance of any ray from the closest non-intersecting segment, and smaller than the distance between any two intersection points on the same ray, then we can get an upper bound. Then we can call r=D/d the "resolution" of the problem, where D is the dimension of the bounding box.
The running time will be bounded by N * r But this is an extremely rough bound, because it basically assumes that every ray has to be resolved to resolution d along its entire length.
Let's try to refine this bound.
For any ray, consider the section of the ray between its origin and its first intersection. For any point along that section of the ray, we can ask which non-intersecting segment is the closest to that point. We can then ask how many distinct non-intersecting segments are the closest to some point along this section of the ray. In other words, if we made a Voronoi diagram of the non-intersecting segments, we are asking how many cells would this section of the ray intersect. If we let p be the maximum answer to this question for any ray, then we can get a better upper bound
N*p*log(r)
by considering the maximum required resolution while searching along the ray.
So if p and r are constant, we have a linear time algorithm!
Unfortunately, I'm pretty sure that p, N and r have some relationship that prevents p and r from remaining roughly constant. As a rough guess, I'd say that r ~ sqrt(N), meaning it's back to N log N running time.
Parallelization is interesting. If we assume C cores, we have the search running time of N/C*p*log(r), but there is still the bottleneck of dividing the subtree. There are log(r) levels, and each segment has to be divided to this level at at most N locations, so it will take N/C*log(r) time, meaning the total running time will be
(N/C)*p*log(r)
Which basically means it's perfectly parallelizable assuming C < N.
Conclusion It seems to me that as a single-core algorithm this would probably not be so good, but on problems with special structure, and/or with parallelization, it may work well. I wouldn't recommend trying to implement this algorithm unless you've thought about it a lot, and you're sure that the rough edges can be smoothed out, and I didn't make a mistake with my analysis. I also suggest searching for published algorithms based on similar concepts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With