Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parameter eps of DBSCAN, python

I have a set of points . Their geometry (SRID: 4326) is stored in a Database. I have been given a code that aims to cluster this points with DBSCAN. The parameters have been set as follow: eps=1000, min_points=1.

I obtain clusters that are less distant than 1000 meters. I believed that two points less distant than 1000 meters would belong to the same cluster. Is epsilon really in meters?

The code is the following:

    self.algorithm='DBSCAN'
    X=self.data[:,[2,3]]
    if self.debug==True:
        print 'Nbr of Points: %d'% len(X)
    # print X.shape
    # print dist_matrix.shape
    D = distance.squareform(distance.pdist(X,'euclidean'))
    # print dist_matrix
    # S = 1 - (D / np.max(D))
    db = DBSCAN(eps, min_samples).fit(D)
    self.core_samples = db.core_sample_indices_
    self.labels = db.labels

the aim is not to find another way to run it but really to understand the value of eps. What it represents in term of distance. Min_sample is set to one because I accept to have clusters with a size of 1 sample indeed.

like image 749
user2879969 Avatar asked Apr 16 '26 17:04

user2879969


1 Answers

This depends on your implementation.

Your distance function could return anything; including meters, millimeters, yards, km, miles, degrees... but you did not share what function you use for computing distance! If I'm not mistaken, SRID: 4326 does not imply anything on distance computations.

The "haversine" used by sklearn seems to use degrees, not meters.

Either way, min_points=1 is nonsensical. The query point is included, so every point itself is a cluster. With min_points <= 2, the result of DBSCAN will be a single-linkage clustering. To get a density based clustering, you need to choose a higher value to get real density.

You may want to use ELKI's DBSCAN. According to their Java sources, their distance function uses meters, but also their R*-tree index allows accelerated range queries with this distance, which will yield a substantial speed-up (O(n log n) instead of O(n^2)).

like image 132
Has QUIT--Anony-Mousse Avatar answered Apr 19 '26 06:04

Has QUIT--Anony-Mousse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!