Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'

I'm having trouble understanding a specific use case of the sklearn.cluster.SpectralClustering class as outlined in the official documentation here. Say I want to use my own affinity matrix to perform clustering. I first instantiate an object of class SpectralClustering as follows:

from sklearn.clustering import SpectralClustering

cl = SpectralClustering(n_clusters=5,affinity='precomputed')

The documentation for the affinity parameter above is as follows:

affinity : string, array-like or callable, default ‘rbf’

If a string, this may be one of ‘nearest_neighbors’, ‘precomputed’, ‘rbf’ or one of the kernels supported by sklearn.metrics.pairwise_kernels. Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

Now the object cl has a method fit for which the documentation about its sole parameter X is as follows:

X : array-like or sparse matrix, shape (n_samples, n_features)

OR, if affinity==precomputed, a precomputed affinity matrix of shape (n_samples, n_samples)

This is where it gets confusing. I am using my own affinity matrix, where a measure of 0 means two points are identical, with a higher number meaning two points are more dissimilar. However, the other choices for the parameter affinity actually take a data set and produce a similarity matrix, for which higher values are indicative of more similarity, and lower values indicate dissimilarity (such as the radial basis kernel).

So when using the fit method on my instance of SpectralClustering, do I actually need to transform my affinity matrix into a similarity matrix before passing it to the fit method call as the parameter X? The same documentation page makes a note on transforming distance to well-behaved similarities, but does not explicitly indicate where this step should should be carried out, and via which method call.

like image 998
R_User Avatar asked Dec 11 '13 21:12

R_User


People also ask

What is affinity in Spectral Clustering?

Basic Terminologies Used in Spectral ClusteringAn Affinity Matrix is like an Adjacency Matrix, except the value for a pair of points expresses how similar those points are to each other. If pairs of points are very dissimilar then the affinity should be 0. If the points are identical, then the affinity might be 1.

Which of the following parameter are used to control affinity propagation clustering?

Unlike clustering algorithms such as k-means or k-medoids, affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm, for this purpose the two important parameters are the preference, which controls how many exemplars (or prototypes) are used, and the ...

What is Sklearn cluster?

It stands for “Density-based spatial clustering of applications with noise”. This algorithm is based on the intuitive notion of “clusters” & “noise” that clusters are dense regions of the lower density in the data space, separated by lower density regions of data points. Scikit-learn have sklearn. cluster.

How do you cluster based on similarity matrix?

If you have a similarity matrix, try to use Spectral methods for clustering. Take a look at Laplacian Eigenmaps for example. The idea is to compute eigenvectors from the Laplacian matrix (computed from the similarity matrix) and then come up with the feature vectors (one for each element) that respect the similarities.


1 Answers

Straight from the docs:

If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:

np.exp(- X ** 2 / (2. * delta ** 2))

This goes in your own code, and the result of this can be passed to fit. For the purpose of this algorithm, affinity means similarity, not distance.

like image 92
Fred Foo Avatar answered Oct 31 '22 06:10

Fred Foo