In order to clusterize a set of time series I'm looking for a smart distance metric. I've tried some well known metric but no one fits to my case.
ex: Let's assume that my cluster algorithm extracts this three centroids [s1, s2, s3]:
I want to put this new example [sx] in the most similar cluster:
The most similar centroids is the second one, so I need to find a distance function d that gives me d(sx, s2) < d(sx, s1)
and d(sx, s2) < d(sx, s3)
edit
Here the results with metrics [cosine, euclidean, minkowski, dynamic type warping] ]3
edit 2
User Pietro P suggested to apply the distances on the cumulated version of the time series The solution works, here the plots and the metrics:
nice question! using any standard distance of R^n (euclidean, manhattan or generically minkowski) over those time series cannot achieve the result you want, since those metrics are independent of the permutations of the coordinate of R^n (while time is strictly ordered and it is the phenomenon you want to capture).
A simple trick, that can do what you ask is using the cumulated version of the time series (sum values over time as time increases) and then apply a standard metric. Using the Manhattan metric, you would get as a distance between two time series the area between their cumulated versions.
Another approach would be by utilizing DTW which is an algorithm to compute the similarity between two temporal sequences. Full disclosure; I coded a Python package for this purpose called trendypy
, you can download via pip (pip install trendypy
). Here is a demo on how to utilize the package. You're just just basically computing the total min distance for different combinations to set the cluster centers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With