Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for detecting "clusters" of dots [closed]

I have a 2D area with "dots" distributed on this area. I now am trying to detect "clusters" of dots, that is, areas with a certain high density of dots.

Any thoughts on (or links to articles with thoughts on) how to elegantly detect these areas?

like image 876
Epaga Avatar asked Dec 10 '08 13:12

Epaga


People also ask

Which algorithm is used for clustering?

k-means is the most widely-used centroid-based clustering algorithm. Centroid-based algorithms are efficient but sensitive to initial conditions and outliers.

What are the two algorithms for clustering?

There are two types of Clustering Algorithms: Bottom-up and Top-down. Bottom-up algorithms regard data points as a single cluster until agglomeration units clustered pairs into a single cluster of data points.

How does the DBSCAN algorithm find the clusters?

The algorithm proceeds by arbitrarily picking up a point in the dataset (until all points have been visited). If there are at least 'minPoint' points within a radius of 'ε' to the point then we consider all these points to be part of the same cluster.

What is clustering algorithm with example?

Here, the cluster center i.e. centroid is formed such that the distance of data points is minimum with the center. This problem is basically one of the NP-Hard problems and thus solutions are commonly approximated over a number of trials. For Ex- K – means algorithm is one of the popular examples of this algorithm.


2 Answers

How about defining an arbitrary resolution for your space, and calculate for each point in that matrix, a measure of the distance from that point to all dots, then you could make a "heat graph" and use a threshold to define the clusters.

It's a nice exercise for processing, maybe later I will post a solution.

EDIT:

Here it is:

//load the image PImage sample; sample = loadImage("test.png"); size(sample.width, sample.height); image(sample, 0, 0); int[][] heat = new int[width][height];  //parameters int resolution = 5; //distance between points in the gridq int distance = 8; //distance at wich two points are considered near float threshold = 0.5; int level = 240; //leven to detect the dots int sensitivity = 1; //how much does each dot matters  //calculate the "heat" on each point of the grid color black = color(0,0,0); loadPixels(); for(int a=0; a<width; a+=resolution){   for(int b=0; b<height; b+=resolution){     for(int x=0; x<width; x++){       for(int y=0; y<height; y++){         color c = sample.pixels[y*sample.width+x];                 /**          * the heat should be a function of the brightness and the distance,           * but this works (tm)          */         if(brightness(c)<level && dist(x,y,a,b)<distance){           heat[a][b] += sensitivity;         }       }     }   } }  //render the output for(int a=0; a<width; ++a){   for(int b=0; b<height; ++b){     pixels[b*sample.width+a] = color(heat[a][b],0,0);   } } updatePixels(); filter(THRESHOLD,threshold); 

EDIT 2 (slighly less inefficient code but same output):

//load the image PImage sample; sample = loadImage("test.png"); size(sample.width, sample.height); image(sample, 0, 0); int[][] heat = new int[width][height]; int dotQ = 0; int[][] dots = new int[width*height][2]; int X = 0; int Y = 1;   //parameters int resolution = 1; //distance between points in the grid int distance = 20; //distance at wich two points are considered near float threshold = 0.6; int level = 240; //minimum brightness to detect the dots int sensitivity = 1; //how much does each dot matters  //detect all dots in the sample loadPixels(); for(int x=0; x<width; x++){  for(int y=0; y<height; y++){    color c = pixels[y*sample.width+x];    if(brightness(c)<level) {        dots[dotQ][X] += x;        dots[dotQ++][Y] += y;    }  } }  //calculate heat for(int x=0; x<width; x+=resolution){  for(int y=0; y<height; y+=resolution){    for(int d=0; d<dotQ; d++){      if(dist(x,y,dots[d][X],dots[d][Y]) < distance)        heat[x][y]+=sensitivity;    }  } }  //render the output for(int a=0; a<width; ++a){  for(int b=0; b<height; ++b){    pixels[b*sample.width+a] = color(heat[a][b],0,0);  } } updatePixels(); filter(THRESHOLD,threshold);  /** This smooths the ouput with low resolutions * for(int i=0; i<10; ++i) filter(DILATE); * for(int i=0; i<3; ++i) filter(BLUR); * filter(THRESHOLD); */ 

And the output with (a reduced) Kent sample:

like image 115
krusty.ar Avatar answered Oct 09 '22 16:10

krusty.ar


I would suggest using a mean-shift kernel to find the density center of your dots.

Mean-shift illustration http://cvr.yorku.ca/members/gradstudents/kosta/compvis/file_mean_shift_path.gif

This figure shows a mean-shift kernel (centered initially on the edge of the cluster) converge towards the cluster's point of highest density.

In theory (in a nutshell):

Several answers to this questions already hinted at the mean-shift way of doing it:

  • P Daddy's blurring the image and finding the darkest spot is actually a kernel density estimation (KDE) method, which is the theoretical basis of the mean-shift.

  • Both j0rd4n and Bill the Lizard suggested to discretize your space into blocks and inspect their densities.

What you see in the animated figure is a combination of these two suggestions: it uses a moving "block" (i.e. the kernel) to seek the locally highest density.

The mean-shift is an iterative method that uses a pixel neighborhood called the kernel (similar to this one) and uses it to compute the mean of the underlying image data. The mean in this context is the pixel-weighted average of the kernel coordinates.

In each iteration the kernel's mean defines its center coordinates for the next iteration - this is called the shift. Hence the name mean-shift. The stop condition for the iterations is when the shift distance drops to 0 (i.e. we are at the most dense spot in the neighborhood).

A comprehensive introduction to mean-shift (both in theory and application) can be found in this ppt presentation.

In practice:

An implementation of the mean-shift is available in OpenCV:

int cvMeanShift( const CvArr* prob_image, CvRect window,                  CvTermCriteria criteria, CvConnectedComp* comp ); 

O'Reilly's Learning OpenCv (google book excerpts) also has a nice explanation on how it works. Basically just feed it your dots image (prob_image).

In practice, the trick is to choose the adequate kernel size. The smaller the kernel, the closer you need to initiate it to the cluster. The bigger the kernel, the more random your initial position can be. However, if there are several clusters of dots in your image, the kernel might converge right between them.

like image 23
Ivan Avatar answered Oct 09 '22 16:10

Ivan