Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Infomap community detection understanding

i need a understandable description of the Infomap Community Detection Algorithm. I read the papers, but it was not clear for me. My questions:

  1. How does the algorithm basically work?
  2. What has random walks to do with it?
  3. What is the map equation and what is (clearly) the difference to modularity optimization? (There was an example given in the paper in Fig. 3 , but i didn't get that)
  4. On their homepage, there are 2 improvements given. The first one is Submodule movements and the second one is Single-node movements. Why are they used and why are merged modules not seperateable?

The homepage: http://www.mapequation.org/code.html

The paper: http://www.mapequation.org/assets/publications/EurPhysJ2010Rosvall.pdf

like image 565
Sully Avatar asked Jan 30 '18 18:01

Sully


1 Answers

The basic idea behind the InfoMap algorithm is to use community partitions of the graph as a Huffman code that compresses the information about a random walker exploring your graph.

Let's unpack what that means. The central object is a random walker exploring the network with the probability that the walker transitions between two nodes given by its Markov transition matrix. At this point, we have effectively coded our network with an individual codeword for each node. However, in most real-world networks, we know that there are regions of the network such that once the random walker enters a region, it tends to stay there for a long time, and movements between the regions are relatively rare. This allows us to combinatorially combine codewords into Huffman codes: we can use a prefix code for each region, and then use a unique codeword for each node within a module, but we can reuse these node level codewords for each module. The same intuition can be gathered by looking at a street names; it would be crazy to have a unique street name for every street in the US, instead, we use states and towns, and then specify a street name, allowing us to reuse street names between towns (how many Main streets are there?).

Here is where the optimization algorithm comes onto the scene: when you use too few modules, you are effectively still back at the level of using an individual codeword for every node, but use too many modules, and the number of prefix codes becomes too large. So we need to find an optimal partition that assigns nodes to modules such that the information needed to compress the movement of our random walkers is minimized (equation 1 from their paper).

The number of possible partitions grows super exponentially in the number of nodes (given by the Bell numbers), so its impossible to do brute-force searches. Instead the authors leverage a variation of the Louvain method (originally designed for modularity maximization) to help them find appropriate partitions. The 2 ``improvements'' you ask about (question 4) are just heuristics to help effectively explore partition space: submodule exploration checks to verify that we didn't create a module that was too large and should have been broken into smaller modules, while single-node movements allow individual nodes to shift between modules.

The InfoMap algorithm and Modularity are both instances of optimal community detection methods: they each have a quality function and then search the space of graph partitions to find the partition that optimizes that quality function. The difference is the quality function: InfoMap focuses on the information needed to compress the movement of the random walker, while Modularity defines modules based on edge density (more edges within a module than expected by chance).

like image 154
ComplexGates Avatar answered Oct 20 '22 02:10

ComplexGates