I have a distance matrix as mentioned in the question here :
Clustering with a distance matrix
Now, I would like to perform DBSCAN on this matrix using the the DBSCANclusterer.java class from apache.
The method 'cluster' takes as input, a collection of points. What is the format of these points?
Referring to the above matrix, what Do i add to the collection parameter?
Can someone please paste a code snippet? I would like to specify the distance as :
A,B : 20 A,C : 20 . . .
And then when I am done with the clustering, similar samples should be clustered together.
Hope this helps.
public class App {
public static void main(String[] args) throws FileNotFoundException, IOException {
File[] files = getFiles("./files2/");
DBSCANClusterer dbscan = new DBSCANClusterer(.05, 50);
List<Cluster<DoublePoint>> cluster = dbscan.cluster(getGPS(files));
for(Cluster<DoublePoint> c: cluster){
System.out.println(c.getPoints().get(0));
}
}
private static File[] getFiles(String args) {
return new File(args).listFiles();
}
private static List<DoublePoint> getGPS(File[] files) throws FileNotFoundException, IOException {
List<DoublePoint> points = new ArrayList<DoublePoint>();
for (File f : files) {
BufferedReader in = new BufferedReader(new FileReader(f));
String line;
while ((line = in.readLine()) != null) {
try {
double[] d = new double[2];
d[0] = Double.parseDouble(line.split(",")[1]);
d[1] = Double.parseDouble(line.split(",")[2]);
points.add(new DoublePoint(d));
} catch (ArrayIndexOutOfBoundsException e) {
} catch(NumberFormatException e){
}
}
}
return points;
}
}
Sample Data:
12-01-99 11:31:01 AM, -40.010, -70.020
12-01-99 11:32:01 AM, -41.010, -71.020
12-01-99 11:33:01 AM, -42.010, -72.020
12-01-99 11:34:01 AM, -43.010, -73.020
12-01-99 11:35:01 AM, -40.010, -74.020
With all the files in a folder called files2 with the location declared in the getFiles method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With