Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate the entropy of a graph?

I have a set of randomly generated formal graphs, and I would like to calculate the entropy of each one. The same question in different words: I have several networks, and want to calculate the information content of each one.

Here are two sources containing formal definitions of graph entropy:
http://www.cs.washington.edu/homes/anuprao/pubs/CSE533Autumn2010/lecture4.pdf (PDF) http://arxiv.org/abs/0711.4175v1

The code I am looking for takes a graph as input (as either an edge list or an adjacency matrix) and outputs a number of bits or some other measure of information content.

Because I can't find an implementation of this anywhere, I am setting out to code this from scratch based on the formal definitions. If anyone has already solved this problem and is willing to share the code, it would be wildly appreciated.

like image 747
shotgun_approach Avatar asked Aug 05 '11 02:08

shotgun_approach


3 Answers

I ended up using different papers for definitions of graph entropy:
Information Theory of Complex Networks: On Evolution and Architectural Constraints
R.V. Sole and S. Valverde (2004)
and
Network Entropy Based on Topology Configuration and Its Computation to Random Networks
B.H. Wang, W.X. Wang and T. Zhou

The code to calculate each is below. The code assumes you have an undirected, unweighted graph with no self-loops. It takes an adjacency matrix as input and returns the amount of entropy in bits. It is implemented in R and makes use of the sna package.

graphEntropy <- function(adj, type="SoleValverde") {
  if (type == "SoleValverde") {
    return(graphEntropySoleValverde(adj))
  }
  else {
    return(graphEntropyWang(adj))
  }
}

graphEntropySoleValverde <- function(adj) {
  # Calculate Sole & Valverde, 2004 graph entropy
  # Uses Equations 1 and 4
  # First we need the denominator of q(k)
  # To get it we need the probability of each degree
  # First get the number of nodes with each degree
  existingDegrees = degree(adj)/2
  maxDegree = nrow(adj) - 1
  allDegrees = 0:maxDegree
  degreeDist = matrix(0, 3, length(allDegrees)+1) # Need an extra zero prob degree for later calculations
  degreeDist[1,] = 0:(maxDegree+1)
  for(aDegree in allDegrees) {
    degreeDist[2,aDegree+1] = sum(existingDegrees == aDegree)
  }
  # Calculate probability of each degree
  for(aDegree in allDegrees) {
    degreeDist[3,aDegree+1] = degreeDist[2,aDegree+1]/sum(degreeDist[2,])
  }
  # Sum of all degrees mult by their probability
  sumkPk = 0
  for(aDegree in allDegrees) {
    sumkPk = sumkPk + degreeDist[2,aDegree+1] * degreeDist[3,aDegree+1]
  }
  # Equivalent is sum(degreeDist[2,] * degreeDist[3,])
  # Now we have all the pieces we need to calculate graph entropy
  graphEntropy = 0
  for(aDegree in 1:maxDegree) {
    q.of.k = ((aDegree + 1)*degreeDist[3,aDegree+2])/sumkPk
    # 0 log2(0) is defined as zero
    if (q.of.k != 0) {
      graphEntropy = graphEntropy + -1 * q.of.k * log2(q.of.k)
    }
  }
  return(graphEntropy)
}

graphEntropyWang <- function(adj) {
  # Calculate Wang, 2008 graph entropy
  # Uses Equation 14
  # bigN is simply the number of nodes
  # littleP is the link probability.  That is the same as graph density calculated by sna with gden().
  bigN = nrow(adj)
  littleP = gden(adj)
  graphEntropy = 0
  if (littleP != 1 && littleP != 0) {
    graphEntropy = -1 * .5 * bigN * (bigN - 1) * (littleP * log2(littleP) + (1-littleP) * log2(1-littleP))
  }
  return(graphEntropy)
}
like image 159
shotgun_approach Avatar answered Sep 18 '22 18:09

shotgun_approach


If you have a weighted graph a good start would be to sort and count all the weights. Then you can use the formula -log(p)+log(2) (http://en.wikipedia.org/wiki/Binary_entropy_function) to determine the amount of bits to be needed for the code. Maybe this doesn't work because it's the binary entropy function?

like image 42
Micromega Avatar answered Sep 17 '22 18:09

Micromega


You can use Koerner's entropy (= Shannon entropy applied to a graph). A good reference for the literature is here. Note however that the computation is in general NP-hard (for the stupid reason that you need to search of the all subsets of vertices).

like image 40
dohmatob Avatar answered Sep 18 '22 18:09

dohmatob