Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cluster a Time Series using DBSCAN python

So I have my data in the form of,

X = [[T1],[T2]..] where Tn is the time series of nth user.

I want to cluster these time series using the DBSCAN method using the scikit-learn library in python. When I try to directly fit the data, I get the output as -1 for all objects, with various values of epsilon and min-points.

What is the correct way to procees?

Here's my code:

db = DBSCAN(eps=0.3,min_samples=10)
db.fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
like image 855
Siddharth Shah Avatar asked Mar 02 '26 21:03

Siddharth Shah


1 Answers

Epsilon can be hard to choose by "random search".

It's a distance threshold - you need to know what is a typical distance of your time series. Right now, you epdilon clearly is too small, because everything is noise in your data set.

In a map based application, one could know what is a good value, e.g. "1 mile radius". But for your time series, how do distances look like? You might not even know yet, which distance function to use.

In the original DBSCAN paper, the authors proposed a simple method for choosing epsilon, based on a k-distance plot.

like image 119
Has QUIT--Anony-Mousse Avatar answered Mar 04 '26 10:03

Has QUIT--Anony-Mousse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!