How to cluster a Time Series using DBSCAN python

Question

So I have my data in the form of,

X = [[T1],[T2]..] where Tn is the time series of nth user.

I want to cluster these time series using the DBSCAN method using the scikit-learn library in python. When I try to directly fit the data, I get the output as -1 for all objects, with various values of epsilon and min-points.

What is the correct way to procees?

Here's my code:

db = DBSCAN(eps=0.3,min_samples=10)
db.fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

Has QUIT--Anony-Mousse · Accepted Answer

Epsilon can be hard to choose by "random search".

It's a distance threshold - you need to know what is a typical distance of your time series. Right now, you epdilon clearly is too small, because everything is noise in your data set.

In a map based application, one could know what is a good value, e.g. "1 mile radius". But for your time series, how do distances look like? You might not even know yet, which distance function to use.

In the original DBSCAN paper, the authors proposed a simple method for choosing epsilon, based on a k-distance plot.

How to cluster a Time Series using DBSCAN python

Tags:

python

cluster-analysis

dbscan

Siddharth Shah

1 Answers

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us

How to cluster a Time Series using DBSCAN python

Tags:

python

cluster-analysis

dbscan

Siddharth Shah

1 Answers

Has QUIT--Anony-Mousse

Related questions

Recent Activity

Donate For Us