Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph Theory: Calculating Clustering Coefficient

I'm doing some research and I've come to a point where I have calculate the clustering coefficient of a graph.

According to this paper directly related to my research:

The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most (kv * (kv-1)) / 2 edges can exist between them (this occurs when every neighbour of v is connected to every other neighbour of v). Let Cv denote the fraction of these allowable edges that actually exist. Define C as the average of Cv over all v

But this wikipedia article on the subject says differently:

C = (number of closed triplets) / (number of connected triples)

It seems to me that the latter is more computationally expensive.

So really my question is: are they equivalent?

It should be noted that the paper is cited by the Wikipedia article.

Thanks for your time.

like image 839
Griffin Avatar asked Jul 10 '11 20:07

Griffin


2 Answers

The two formulas are not the same; they are two different ways in which the global clustering coefficient can be calculated.

One way is by averaging the clustering coefficients (C_i [1]) of all nodes (this is the method you quoted from Watts and Strogatz). However, in [2, p204] Newman argues that this method is less preferable than the second one (the one you got from wikipedia). He justifies by pointing how the value of the global clustering coeff can be dominated by nodes of low degree, due to C_i's denominator [1]. So, in a network with many nodes of low degrees, you end up with a large value for the global clustering coeff, which Newman argues would be unrepresentative.

However, many network studies (or, in my experience, at least many studies concerned with online social networks) seem to have used this method, so in order to be able to compare your results with theirs, you would require to use the same method. Furthermore, the critique raised by Newman does not affect the extent to which comparisons of global clustering coefficients can be made, provided the same method was used in measuring them.

The two formulae are different and were proposed at different moments in time. The one you quoted from Watt and Strogatz is older, which is perhaps why it seems to have been more commonly used. Newman also explains that the two formulae are far from equivalent, and shouldn't be used as such. He says they can give substantially different numbers for a given network, however doesn't explain why.

[1] C_i = (number of pairs of neighbours of i that are connected) / (number of pairs of neighbours of i)

[2] Newman, M.E.J.. Networks : an introduction. Oxford New York: Oxford University Press, 2010. Print.

Edit:

I am including here a series of calculations for the same ER random graph. You can see how the two methods give different results, even for undirected graphs. (done using Mathematica)

like image 50
WindChimes Avatar answered Oct 08 '22 12:10

WindChimes


I think they're equivalent. The wiki page you link to gives a proof that the triples formulation is equivalent to the fraction of possible edges formulation when calculating the local clustering coefficient, i.e. calculated just at a vertex. From there it seems that you just need to show that

sum_v lambda(v)/tau(v) = 3 x # triangles / # connected triples

where lambda(v) is the number of triangles containing v, and tau(v) is the number of connected triples for which v is the middle vertex, i.e. adjacent to each of the other 2 edges.

Now each triangle gets counted three times in the numerator of the LHS. However, each connected triple is only counted once for the middle vertex on the LHS, so the denominators are the same.

like image 44
Whatang Avatar answered Oct 08 '22 13:10

Whatang