Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'

Tags:

I'm having trouble understanding a specific use case of the sklearn.cluster.SpectralClustering class as outlined in the official documentation here. Say I want to use my own affinity matrix to perform clustering. I first instantiate an object of class SpectralClustering as follows:

from sklearn.clustering import SpectralClustering

cl = SpectralClustering(n_clusters=5,affinity='precomputed')

The documentation for the affinity parameter above is as follows:

affinity : string, array-like or callable, default ‘rbf’

If a string, this may be one of ‘nearest_neighbors’, ‘precomputed’, ‘rbf’ or one of the kernels supported by sklearn.metrics.pairwise_kernels. Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

Now the object cl has a method fit for which the documentation about its sole parameter X is as follows:

X : array-like or sparse matrix, shape (n_samples, n_features)

OR, if affinity==precomputed, a precomputed affinity matrix of shape (n_samples, n_samples)

This is where it gets confusing. I am using my own affinity matrix, where a measure of 0 means two points are identical, with a higher number meaning two points are more dissimilar. However, the other choices for the parameter affinity actually take a data set and produce a similarity matrix, for which higher values are indicative of more similarity, and lower values indicate dissimilarity (such as the radial basis kernel).

So when using the fit method on my instance of SpectralClustering, do I actually need to transform my affinity matrix into a similarity matrix before passing it to the fit method call as the parameter X? The same documentation page makes a note on transforming distance to well-behaved similarities, but does not explicitly indicate where this step should should be carried out, and via which method call.

998

asked Dec 11 '13 21:12

R_User

1 Answers

Straight from the docs:

If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:

np.exp(- X ** 2 / (2. * delta ** 2))

This goes in your own code, and the result of this can be passed to fit. For the purpose of this algorithm, affinity means similarity, not distance.

answered Oct 31 '22 06:10

Fred Foo

Related questions
                            
                                Plot contours for the densest region of a scatter plot
                            
                                An easy way to mock loosely defined Python dict objects
                            
                                Pythonic way of writing a library function which accepts multiple types?
                            
                                what does [...] mean as an output in python? [duplicate]
                            
                                Divide entire pandas multiIndex dataframe by dataframe variable
                            
                                String parsing using Python?
                            
                                array slicing in numpy
                            
                                Segmentation Fault in Pandas read_csv
                            
                                Getting correct timestamp from cassandra using datastax python-driver
                            
                                Python - Is the grammar for 3.0 the same as 3.3?
                            
                                Python callback function placeholders?
                            
                                pandas.series.copy doesn't create new object
                            
                                Clean Python multiprocess termination dependant on an exit flag
                            
                                Python shutil.copy fails on FAT file systems (Ubuntu)
                            
                                Python wait and check if file is created completely by external program
                            
                                Issues with pyinstaller and pyproj
                            
                                Why doesn't globals work as I would expect when importing?
                            
                                Can sklearn Random Forest classifier adjust sample size by tree, to handle class imbalance?
                            
                                Python "header.py" module
                            
                                Fastest way to filter a numpy array by a set of values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'

Tags:

python

cluster-analysis

scikit-learn

R_User

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us