Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Markov Clustering Algorithm

I've been working through the following example of the details of the Markov Clustering algorithm:

http://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf

I feel like I have accurately represented the algorithm but I am not getting the same results that this guide, at least, was getting for that input.

Current code is at: http://jsfiddle.net/methodin/CtGJ9/

I am not sure if perhaps I have just missed a small fact or just need a small tweak somewhere for this but I have tried a few variations including:

  1. Swapping the Inflation/Expansion
  2. Checking for equality based on a precision
  3. Removing the normalization (since the original guide did not require it, although the official MCL documentation states to normalize the matrix on every pass)

All of these have returned the same result - the node only influences itself.

I've even found a similar algorithm implementation in VB: http://mcl.codeplex.com/SourceControl/changeset/changes/17748#MCL%2fMCL%2fMatrix.vb

And my code seems to match up with the exception of their numbering (600 - distance for instance).

This is the expansion function

// Take the (power)th power of the matrix effectively multiplying it with
// itself pow times
this.matrixExpand = function(matrix, pow) {
    var resultMatrix = [];
    for(var row=0;row<matrix.length;row++) {
        resultMatrix[row] = [];
        for(var col=0;col<matrix.length;col++) {
            var result = 0;
            for(var c=0;c<matrix.length;c++)
                result += matrix[row][c] * matrix[c][col];
            resultMatrix[row][col] = result;
        }
    }
    return resultMatrix;
}; 

And this is the inflation function

// Applies a power of X to each item in the matrix
this.matrixInflate = function(matrix, pow) {
    for(var row=0;row<matrix.length;row++) 
        for(var col=0;col<matrix.length;col++)
            matrix[row][col] = Math.pow(matrix[row][col], pow);
};

And finally the main passthru function

// Girvan–Newman algorithm
this.getMarkovCluster = function(power, inflation) {
    var lastMatrix = [];

    var currentMatrix = this.getAssociatedMatrix();
    this.print(currentMatrix);        
    this.normalize(currentMatrix);  

    currentMatrix = this.matrixExpand(currentMatrix, power);    
    this.matrixInflate(currentMatrix, inflation);                               
    this.normalize(currentMatrix);

    while(!this.equals(currentMatrix,lastMatrix)) {
        lastMatrix = currentMatrix.slice(0);

        currentMatrix = this.matrixExpand(currentMatrix, power);                
        this.matrixInflate(currentMatrix, inflation);         
        this.normalize(currentMatrix);            
    }
    return currentMatrix;
};
like image 955
methodin Avatar asked Jan 06 '12 20:01

methodin


People also ask

What is Markov clustering algorithm?

The Markov Cluster (MCL) Algorithm is an unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs. Markov clustering was the work of Stijn van Dongen and you can read his thesis on the Markov Cluster Algorithm.

What are the algorithms used for clustering?

k-means is the most widely-used centroid-based clustering algorithm. Centroid-based algorithms are efficient but sensitive to initial conditions and outliers. This course focuses on k-means because it is an efficient, effective, and simple clustering algorithm.

What is a hierarchical clustering algorithm?

Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.

What is K-means clustering in string?

K-means is a popular clustering algorithm which is widely used in anomaly-based intrusion detection. It tries to classify a given data set into k (a predefined number) categories.


1 Answers

Your implementation is correct. The example is just wrong.

The three result matrices on the "Repeat steps 5 and 6 until a steady state is reached (convergence)" slide are the results of ONLY performing inflation.

To clarify some points about the algorithm.

  1. Expantion then inflation.
  2. Precision is an important factor. The equations can never lead to convergence mathematically. Its the limited precision of floating point operations on cpus that cause some items in the matrix to become zero rather than infinitly small numbers. In fact the official implementation uses a cutoff value to eliminate a certain amount of items per column to speed up convergence and improve the time complexity of the algorithm. In the original thesis the author analized the effect of this and concluded that the cutoff gives the same result in practice as a slight increase of the inflation parameter.
  3. Normalization is an integral part of the inflation step, read the equation in that guide again.

Regarding your code.

  1. array.slice creates a shallow copy but this not important in your case since you create a new array in matrixExpand.
  2. Your implementation of matrixExpand ignores the pow variable, and always does a power of 2.

The result where there are all ones in the first row is expected, which is interpreted as all elements are in the same cluster.

like image 55
Michał Modzelewski Avatar answered Oct 15 '22 06:10

Michał Modzelewski