I'm looking for an algorithm that tests whether 2 line segments are intersecting in a GPU-friendly way. The line segments are in 2D. While there are many algorithms discussed on the web for doing this, all the ones I have seen use a lot of branching instructions. On a CPU, this is not such a problem. However, on a GPU, a lot of branching instructions will kill performance.
Does anyone know of a good algorithm for this environment? Any sample pseudo-code, CUDA code, OpenCL code, or DirectCompute compute would be appreciated.
FOLLOW UP
In case anyone is interested, this is (basically) what I ended up with. I trimmed it down, so it's more like pseudo-code than OpenCL C, but hopefully it gets the gist across.
__kernel void findCrossingLines(__global float2 p1, __global float2 p2,
__global float2 p3, __global float2 p4,
__global bool* output)
{
int result = 0;
float2 A = p2;
float2 a = p2 - p1;
float2 B = p4;
float2 b = p4 - p3;
float2 c = B - A;
float2 b_perp = (float2)(-b.y, b.x);
float numerator = dot(b_perp, c);
float denominator = dot(b_perp, a);
bool isParallel = (denominator == 0.0f);
float quotient = numerator / denominator;
float2 intersectionPoint = (float2)(quotient * a.x + A.x, quotient * a.y + A.y);
*output = (!isParallel &&
intersectionPoint.x >= min(p1.x, p2.x) &&
intersectionPoint.x >= min(p3.x, p4.x) &&
intersectionPoint.x <= max(p1.x, p2.x) &&
intersectionPoint.x <= max(p3.x, p4.x) &&
intersectionPoint.y >= min(p1.y, p2.y) &&
intersectionPoint.y >= min(p3.y, p4.y) &&
intersectionPoint.y <= max(p1.y, p2.y) &&
intersectionPoint.y <= max(p3.y, p4.y));
}
This is outside my area of expertise, but the following thread on the NVIDIA CUDA forums discussed this topic and contains some code that may provide a useful starting point:
https://forums.developer.nvidia.com/t/intersecting-line-segments-on-gpu/18832
Note that code does not have to be completely branchless to be efficient on NVIDIA GPUs. The architecture supports predication as well as "select"-type instructions (akin to the ternary operator in C/C++), and the compiler is pretty good at mapping local branching to those branch-less constructs. I would recommend using cuobjdump to check the generated machine code, that way you'll know precisely how many branches are actually being used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With