Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the result of dtwclust

now I'm using dtwclust package (Thanks to Author Alexis Sarda-Espinosa & Alexis Sarda~)

I'm stuck on an easy issue. Here is my code.

sc <- read.table("D:/handling data/confirm.csv", header=T, sep="," )
rownames(sc) <- sc$STDR_YM_CD
sc$STDR_YM_CD <- NULL
sc <- t(sc)
hc_sbd <- dtwclust(sc, type = 'h', k=3L, method = 'average', preproc = zscore,
               distance = 'dtw', control = list(trace=TRUE) )

plot(hc_sbd@cluster)
plot(hc_sbd, type='sc')
plot(hc_sbd, type='series', clus=2)
plot(hc_sbd, type='centroids', clus=2)

head(hc_sbd)
write.xlsx(hc_sbd, "D:/handling data/tab1clustn.xlsx")

I got this picture. And I would like to export my data with cluster labels. like the second picture.

enter image description here enter image description here

Here's my data link http://blogattach.naver.com/e772fb415a6c6ddafd137d427d9ee7953f6e9146/20170207_141_blogfile/khm2963_1486442387926_THgZRt_csv/confirm.csv?type=attachment

like image 893
김지영 Avatar asked Mar 10 '23 21:03

김지영


2 Answers

The answer from @Wayne Lee is over doing it. There is no need to declare a data.frame and we do not need to to merge the data.

All clustering algorithms I know, return a cluster assignment vector cluster, which has the same length as df has rows. Theforefore just cbind the cluster vector to your data df:

add_cluster_to_csv<-cbind(df,cluster=hc_sbd@cluster)

This should also reduce computation time, since we do not use merge and cbind is much faster than data.frame.

Appendix:

The whole code would look like this:

### Pass the data into a dataframe:
df <- read.csv('D:/handling data/confirm.csv',header=TRUE,sep=',')

### Run dtwclust:
hc_sbd <- dtwclust(sc, type = 'h', k=3L, method = 'average', preproc = zscore,
               distance = 'dtw', control = list(trace=TRUE)
cluster <- hc_sbd@cluster                   ### Extract the cluster
add_cluster_to_csv<-cbind(df,cluster)       ### Combine the original dataframe with the vector 

### Write to new csv:
write.csv(add_cluster_to_csv,'Csv_with_cluster.csv')
like image 122
mobiuscreek Avatar answered Mar 12 '23 10:03

mobiuscreek


I assume STDR_YM_CD is your unique identifier which you would like to cluster with DTW.

sc <- read.table("D:/handling data/confirm.csv", header=T, sep="," )
df.labels <- sc$STDR_YM_CD    #rownames(sc) <- sc$STDR_YM_CD
sc$STDR_YM_CD <- NULL
sc <- t(sc)

hc_sbd <- dtwclust(sc, type = 'h', k=3L, method = 'average', preproc = zscore,
           distance = 'dtw', control = list(trace=TRUE) )

hc.clust <- data.frame(STDR_YM_CD = df.labels, dtwclust = hc_sbd@cluster)

sc <- merge(sc,hc.clust, by.x = "STDR_YM_CD", by.y = "STDR_YM_CD")

I just extract the labels, the variable you are trying to cluster, then I create a new data frame from the dtwclust result with the column name dtwclust. I think merge them back based on our unique labels. There are other ways to do this as well, but this is one option. I hope it helped!

like image 40
Wayne Lee Avatar answered Mar 12 '23 11:03

Wayne Lee