Clustering GPS data using DBSCAN but clusters are not meaningful (in terms of size)

Tags:

I am working with GPS data (latitude, longitude). For density based clustering I have used DBSCAN in R.

Advantages of DBSCAN in my case:

I don't have to predefine numbers of clusters

I can calculate a distance matrix (using Haversine Distance Formula) and use that as input in dbscan

library(fossil)
dist<- earth.dist(df, dist=T) #df is dataset containing lat long values
library(fpc)
dens<-dbscan(dist,MinPts=25,eps=0.43,method="dist")

Now, when I look at the clusters, they are not meaningful. Some clusters have points which are more than 1km apart. I want dense clusters but not that big in size.

Different values of MinPts and eps are taken care of and I have also used k nearest neighbor distance graph to get an optimum value of eps for MinPts=25

What dbscan is doing is going to every point in my dataset and if point p has MinPts in its eps neighborhood it will make a cluster but at the same time it is also joining the clusters which are density reachable (which I guess are creating a problem for me).

It really is a big question, particularly "how to reduce size of a cluster without affecting its information too much", but I will write it down as the following points:

How to remove border points in a cluster? I know which points are in which cluster using dens$cluster, but how would I know if a particular point is core or border?
Is cluster 0 always noise?
I was under the impression that the size of a cluster would be comparable to eps. But that's not the case because density reachable clusters are combined together.
Is there any other clustering method which has the advantage of dbscan but can give me more meaningful clusters?

OPTICS is another alternative but will it solve my issue?

Note: By meaningful I want to say closer points should be in a cluster. But points which are 1km or more apart should not be in the same cluster.

825

asked Dec 31 '13 11:12

1 Answers

DBSCAN doesn't claim the radius is the maximum cluster size.

Have you read the article? It's looking for arbitrarily shaped clusters; eps is just the core size of a point; roughly the size used for density estimation; any point within this radius of a core point will be part of a cluster.

This makes it essentially the maximum step size to connect dense points. But they may still form a chain of density connected points, of arbitary shape or size.

I don't know what cluster 0 is in your R implementation. I've experimented with the R implementation, but it was waaaay slower than all the others. I don't recommend using R, there are much better tools for cluster analysis available, such as ELKI. Try running DBSCAN with your settings on ELKI, with LatLngDistanceFunction and and sort-tile-recursive loaded R-tree index. You'll be surprised how fast it can be, compared to R.

OPTICS is looking for the same density connected type of clusters. Are you sure this arbitrarily-shaped type of clusters is what you are looking for?

IMHO, you are using the wrong method for your goals (and you aren't really explaining what you are trying to achieve)

If you want a hard limit on the cluster diameter, use complete-linkage hierarchical clustering.

113

answered Oct 03 '22 04:10

Has QUIT--Anony-Mousse

Related questions
                            
                                Dynamically increase size of list in Rcpp
                            
                                What makes rollmean faster than rollapply (code-wise)?
                            
                                Find pairwise overlaps of intervals (segments)
                            
                                ConditionalPanel doesn't support variables with dot in the name, any work around?
                            
                                Force facet_wrap to fill bottom row (and leave any "gaps" in the top row)
                            
                                Invalidate a chunk's cache when uncached chunk changes
                            
                                what is `[[` looking for in this sapply example?
                            
                                Stylize or format text in R Shiny Server
                            
                                How to handle hyphens in yahoo finance tickers in Quantmod [duplicate]
                            
                                pandas / matplotlib: faceting bar plots
                            
                                Cutting dendrogram into n trees with minimum cluster size in R
                            
                                the difference between `\\s|*` and `\\s|[*]` in regular expression in r?
                            
                                Is it possible to query an `R` function for the default value of its parameters?
                            
                                Increase efficiency in finding first occurrence of events
                            
                                How to create a technical indicator in quantmod package
                            
                                clusterExport, environment and variable scoping
                            
                                How do I change the geom plotting order in legend only?
                            
                                Vector subsetting performance: name versus index
                            
                                Documentation on internal variables in ggplot, esp. PANEL
                            
                                SQL query with comments import into R from file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clustering GPS data using DBSCAN but clusters are not meaningful (in terms of size)

Tags:

r

cluster-analysis

dbscan

sau

People also ask

1 Answers

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us