Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to test a clustering algorithm

What is the best way to test a clustering algorithm? I am using an agglomerative clustering algorithm with a stop criterion. How do I test if the clusters are formed correctly or not?

like image 828
London guy Avatar asked Apr 23 '12 10:04

London guy


1 Answers

A good rule of thumb for evaluating how much a graph can be clustered (on a coarse grained level) has to do with the "eigenvalue gap". Given a weighted graph A, calculate the eigenvalues and sort them (this is the eigenvalue spectrum). When plotted, if there is a large jump in the spectrum at some point, there is a natural corresponding block to partition the graph.

Below is an example (in numpy python) that shows, given an almost block diagonal matrix there a large gap in the eigenvalue spectrum at the number of blocks (parameterized by c in the code). Note that a matrix permutation (identical to labeling your graph nodes) still gives the same spectral gap:

from numpy import *
import pylab as plt

# Make a block diagonal matrix
N = 30
c = 5
A = zeros((N*c,N*c))
for m in xrange(c):
    A[m*N:(m+1)*N, m*N:(m+1)*N] = random.random((N,N))

# Add some noise
A += random.random(A.shape) * 0.1

# Make symmetric
A += A.T - diag(A.diagonal())

# Show the original matrix
plt.subplot(131)
plt.imshow(A.copy(), interpolation='nearest')

# Permute the matrix for effect
idx = random.permutation(N*c)
A = A[idx,:][:,idx]

# Compute eigenvalues
L = linalg.eigvalsh(A)

# Show the results
plt.subplot(132)
plt.imshow(A, interpolation='nearest')
plt.subplot(133)
plt.plot(sorted(L,reverse=True))

plt.plot([c-.5,c-.5],[0,max(L)],'r--')

plt.ylim(0,max(L))
plt.xlim(0,20)
plt.show() 

enter image description here

like image 133
Hooked Avatar answered Sep 29 '22 01:09

Hooked