I'm using OpenCL to find the nearest neighbour between two set of 3D points.
Nearest Neighbour: For each point(x,y,z) in the DataSet I have to find the nearest one in the model. Squared distance = (Ax-Bx)^2 + (Ay-By)^2 + (Az-Bz)^2
Here what I've done so far:
struct point {
int x;
int y;
int z;
};
__kernel void
nearest_neighbour(__global struct point *model,
__global struct point *dataset,
__global int *nearest,
const unsigned int model_size)
{
int g_dataset_id = get_global_id(0);
int dmin = -1;
int d, dx, dy, dz;
for (int i=0; i<model_size; ++i) {
dx = model[i].x - dataset[g_dataset_id].x;
dx = dx * dx;
dy = model[i].y - dataset[g_dataset_id].y;
dy = dy * dy;
dz = model[i].z - dataset[g_dataset_id].z;
dz = dz * dz;
d = dx + dy + dz;
if(dmin == -1 || d < dmin)
{
nearest[g_dataset_id] = i;
dmin = d;
}
}
}
The code seems to work, but I'm sure that it can be optimized. I would like to know how can I take advantage of the local memory to make it better.
Thanks
P.S. I know that there are other (better) methods to find nearest neighbour, like kd-tree, but for now I would like to do the easy one.
The compiler is probably hoisting these loop-invariants for you, but to be sure it gets done, try this code which assigns them to temporaries named datum_x
and so on. Also, initializing dmin to MAX_INT
allows you to avoid the superfluous comparison with -1
. Another approach is to unroll the first loop iteration (with i=0
) in order to initialize dmin.
int dmin = MAX_INT;
int d, dx, dy, dz;
int datum_x, datum_y, datum_z;
datum_x = dataset[g_model_id].x;
datum_y = dataset[g_model_id].y;
datum_z = dataset[g_model_id].z;
for (int i=0; i<size_dataset; ++i) {
dx = model[i].x - datum_x;
dx = dx * dx;
dy = model[i].y - datum_y;
dy = dy * dy;
dz = model[i].z - datum_z;
dz = dz * dz;
d = dx + dy + dz;
if(d < dmin)
{
nearest[g_dataset_id] = i;
dmin = d;
}
}
Maybe a quick pre-filter can speed things up. Instead of calculating the squared distance immediately, you can first check if distance in all three coordinates are closer than dmin. So, you can replace your inner loop with
{
dx = model[i].x - datum_x;
if (abs(dx) >= dmin) continue;
dy = model[i].y - datum_y;
if (abs(dy) >= dmin) continue;
dz = model[i].z - datum_z;
if (abs(dz) >= dmin) continue;
dx = dx * dx;
dy = dy * dy;
dz = dz * dz;
d = dx + dy + dz;
if(d < dmin)
{
nearest[g_dataset_id] = i;
dmin = d;
}
}
I am not sure if the extra calls to the abs()
and the if
s per point will filter out enough data points, but it is a simple enough change to try out, I think.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With