scipy linkage format

Tags:

I have written my own clustering routine and would like to produce a dendrogram. The easiest way to do this would be to use scipy dendrogram function. However, this requires the input to be in the same format that the scipy linkage function produces. I cannot find an example of how the output of this is formatted. I was wondering whether someone out there can enlighten me.

531

asked Mar 23 '12 12:03

geo_pythoncl

1 Answers

I agree with https://stackoverflow.com/users/1167475/mortonjt that the documentation does not fully explain the indexing of intermediate clusters, while I do agree with the https://stackoverflow.com/users/1354844/dkar that the format is otherwise precisely explained.

Using the example data from this question: Tutorial for scipy.cluster.hierarchy

A = np.array([[0.1,   2.5],               [1.5,   .4 ],               [0.3,   1  ],               [1  ,   .8 ],               [0.5,   0  ],               [0  ,   0.5],               [0.5,   0.5],               [2.7,   2  ],               [2.2,   3.1],               [3  ,   2  ],               [3.2,   1.3]])

A linkage matrix can be built using the single (i.e, the closest matching points):

z = hac.linkage(a, method="single")   array([[  7.        ,   9.        ,   0.3       ,   2.        ],         [  4.        ,   6.        ,   0.5       ,   2.        ],         [  5.        ,  12.        ,   0.5       ,   3.        ],         [  2.        ,  13.        ,   0.53851648,   4.        ],         [  3.        ,  14.        ,   0.58309519,   5.        ],         [  1.        ,  15.        ,   0.64031242,   6.        ],         [ 10.        ,  11.        ,   0.72801099,   3.        ],         [  8.        ,  17.        ,   1.2083046 ,   4.        ],         [  0.        ,  16.        ,   1.5132746 ,   7.        ],         [ 18.        ,  19.        ,   1.92353841,  11.        ]])

As the documentation explains the clusters below n (here: 11) are simply the data points in the original matrix A. The intermediate clusters going forward, are indexed successively.

Thus, clusters 7 and 9 (the first merge) are merged into cluster 11, clusters 4 and 6 into 12. Then observe line three, merging clusters 5 (from A) and 12 (from the not-shown intermediate cluster 12) resulting with a Within-Cluster Distance (WCD) of 0.5. The single method entails that the new WCS is 0.5, which is the distance between A[5] and the closest point in cluster 12, A[4] and A[6]. Let's check:

 In [198]: norm([a[5]-a[4]])  Out[198]: 0.70710678118654757  In [199]: norm([a[5]-a[6]])  Out[199]: 0.5

This cluster should now be intermediate cluster 13, which subsequently is merged with A[2]. Thus, the new distance should be the closest between the points A[2] and A[4,5,6].

 In [200]: norm([a[2]-a[4]])  Out[200]: 1.019803902718557  In [201]: norm([a[2]-a[5]])  Out[201]: 0.58309518948452999  In [202]: norm([a[2]-a[6]])  Out[202]: 0.53851648071345048

Which, as can be seen also checks out, and explains the intermediate format of new clusters.

134

answered Sep 25 '22 22:09

user1603472

Related questions
                            
                                Python re.sub(): how to substitute all 'u' or 'U's with 'you'
                            
                                What is the inverse of regularization strength in Logistic Regression? How should it affect my code?
                            
                                Export a Pandas dataframe as a table image
                            
                                How to check if an object is an instance of a namedtuple?
                            
                                Python: How do I get key/value pairs from the BaseHTTPRequestHandler HTTP POST handler?
                            
                                pandas assign with new column name as string
                            
                                Does Python support MySQL prepared statements?
                            
                                Convert variable-sized byte array to a integer/long
                            
                                Installing pip is not working in python < 3.6
                            
                                How to smooth matplotlib contour plot?
                            
                                How to process images of a video, frame by frame, in video streaming using OpenCV and Python
                            
                                hash unicode string in python
                            
                                Python: find all classes which inherit from this one? [duplicate]
                            
                                Connect to an URI in postgres
                            
                                Passing a function to re.sub in Python
                            
                                Python Mock Multiple Calls with Different Results
                            
                                NameError: name 'get_ipython' is not defined
                            
                                Why is statistics.mean() so slow?
                            
                                How to read partitioned parquet files from S3 using pyarrow in python
                            
                                Reading e-mails from Outlook with Python through MAPI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scipy linkage format

Tags:

python

scipy

dendrogram

geo_pythoncl

People also ask

1 Answers

user1603472

Recent Activity

Donate For Us