Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use apache's DBSCANClusterer

I have a distance matrix as mentioned in the question here :

Clustering with a distance matrix

Now, I would like to perform DBSCAN on this matrix using the the DBSCANclusterer.java class from apache.

The method 'cluster' takes as input, a collection of points. What is the format of these points?

Referring to the above matrix, what Do i add to the collection parameter?

Can someone please paste a code snippet? I would like to specify the distance as :

A,B : 20 A,C : 20 . . .

And then when I am done with the clustering, similar samples should be clustered together.

like image 223
Nikhil Avatar asked Nov 25 '13 09:11

Nikhil


1 Answers

Hope this helps.

public class App {

public static void main(String[] args) throws FileNotFoundException, IOException {
    File[] files = getFiles("./files2/");

    DBSCANClusterer dbscan = new DBSCANClusterer(.05, 50);
    List<Cluster<DoublePoint>> cluster = dbscan.cluster(getGPS(files));

    for(Cluster<DoublePoint> c: cluster){
        System.out.println(c.getPoints().get(0));
    }                       
}

private static File[] getFiles(String args) {
    return new File(args).listFiles();
}

private static List<DoublePoint> getGPS(File[] files) throws FileNotFoundException, IOException {

    List<DoublePoint> points = new ArrayList<DoublePoint>();
    for (File f : files) {
        BufferedReader in = new BufferedReader(new FileReader(f));
        String line;

        while ((line = in.readLine()) != null) {
            try {
                double[] d = new double[2];
                d[0] = Double.parseDouble(line.split(",")[1]);
                d[1] = Double.parseDouble(line.split(",")[2]);
                points.add(new DoublePoint(d));
            } catch (ArrayIndexOutOfBoundsException e) {
            } catch(NumberFormatException e){
            }
        }
    }
    return points;
}
}

Sample Data:

12-01-99 11:31:01 AM, -40.010, -70.020
12-01-99 11:32:01 AM, -41.010, -71.020
12-01-99 11:33:01 AM, -42.010, -72.020
12-01-99 11:34:01 AM, -43.010, -73.020
12-01-99 11:35:01 AM, -40.010, -74.020

With all the files in a folder called files2 with the location declared in the getFiles method.

like image 140
Dan Ciborowski - MSFT Avatar answered Nov 13 '22 06:11

Dan Ciborowski - MSFT