Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate Eb(k) of networks with Python?

Tags:

In the paper titled Scaling of degree correlations and its influence on diffusion in scale-free networks, the authors define the quantity of $E_b(k)$ to measure the extent of degree correlations.

enter image description here

enter image description here

Paper

L. K. Gallos, C. Song, and H. A. Makse, Scaling of Degree Correlations and Its Influence on Diffusion in Scale Free Networks, Phys. Rev. Lett. 100, 248701 (2008).

You can read the article following this link or read the related google book.

Question

enter image description here

My question is how to calculate Eb(k) of networks with Python? My problem is I can't reproduce the results of the authors. I test it using the Condense Matter data. The result of Eb(k) is showed in the figure above. You can see that one problem in my figure is the Eb(k) is much larger than 1!!! I have also tried the Internet (As level data) and the WWW data, and the problem persists. No doubt, there is something seriously wrong with my algorithm or code. You can reproduce my results, and compare it with the authors'. Your solution or suggestion are highly appreciated. I will introduce my algorithm and python script below.

I follow the following steps:

  1. For each edge, to find the edges whose k=k, and k' > 3k. The probability of these edges is denoted as P(k, k')
  2. For node, to get the proportion of nodes whose degree is larger than b*k, which is denoted as p(k'), thus we can also have k'*p(k')
  3. To get the numerator P1: p1 = \sum P(k, k')/k'*P(k')
  4. To get the denominator p2:P2 = \sum P(k')
  5. Eb(k) = p1/p2

Python script

The python script is given below:

%matplotlib inline import networkx as nx import matplotlib.cm as cm import matplotlib.pyplot as plt from collections import defaultdict  def ebks(g, b):     edge_dict = defaultdict(lambda: defaultdict(int))     degree_dict = defaultdict(int)     edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]     for e in edge_degree:         edge_dict[e[0]][e[-1]] +=1     for i in g.degree().values():         degree_dict[i] +=1     edge_number = g.number_of_edges()     node_number = g.number_of_nodes()     ebks, ks = [], []     for k1 in edge_dict:         p1, p2 = 0, 0         for k2 in edge_dict[k1]:             if k2 >= b*k1:                 pkk = float(edge_dict[k1][k2])/edge_number                 pk2 = float(degree_dict[k2])/node_number                 k2pk2 = k2*pk2                 p1 += pkk/k2pk2         for k in degree_dict:             if k>=b*k1:                 pk = float(degree_dict[k])/node_number                 p2 += pk         if p2 > 0:             ebks.append(p1/p2)             ks.append(k1)     return ebks, ks 

I test with the ca-CondMat data, you can download it from this url: http://snap.stanford.edu/data/ca-CondMat.html

# Load the data # Remember to change the file path to your own ca = nx.Graph() with open ('/path-of-your-file/ca-CondMat.txt') as f:     for line in f:         if line[0] != '#':             x, y = line.strip().split('\t')             ca.add_edge(x,y) nx.info(ca)  #calculate ebk  ebk, k = ebks(ca, b=3)  plt.plot(k,ebk,'r^') plt.xlabel(r'$k$', fontsize = 16) plt.ylabel(r'$E_b(k)$', fontsize = 16) plt.xscale('log') plt.yscale('log') plt.show() 

Update: The problem has not been solved yet.

def ebkss(g, b, x):     edge_dict = defaultdict(lambda: defaultdict(int))     degree_dict = defaultdict(int)     edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]     for e in edge_degree:         edge_dict[e[0]][e[-1]] +=1     for i in g.degree().values():         degree_dict[i] +=1     edge_number = g.number_of_edges()     node_number = g.number_of_nodes()     ebks, ks = [], []     for k1 in edge_dict:         p1, p2 = 0, 0         nk2k = np.sum(edge_dict[k1].values())         pk1 = float(degree_dict[k1])/node_number         k1pk1 = k1*pk1         for k2 in edge_dict[k1]:             if k2 >= b*k1:                 pk2k = float(edge_dict[k1][k2])/nk2k                 pk2 = float(degree_dict[k2])/node_number                 k2pk2 = k2*pk2                 p1 += (pk2k*k1pk1)/k2pk2         for k in degree_dict:             if k>=b*k1:                 pk = float(degree_dict[k])/node_number                 p2 += pk         if p2 > 0:             ebks.append(p1/p2**x)             ks.append(k1)     return ebks, ks 
like image 807
Frank Wang Avatar asked Jul 16 '16 05:07

Frank Wang


1 Answers

According to the paper, the purpose of Eb(k) is to get the correlation exponent epsilon: "[We] introduce a scale-invariant quantity Ebk to simplify the estimation of epsilon" (second page, bottom of first column).

I haven't found a way to make Eb(k) < 1, but I have found a correction that computes epsilon correctly.

According to equation 4, Eb(k) ~ k^-(epsilon-gamma) (where the degree distribution P(k) ~ k^-gamma, a power-law). Thus, if we plot the slope of log(Eb(k)) against log(k), we should get gamma - epsilon. Knowing gamma, we can then easily get epsilon.

Note that this slope is invariant if Eb(k) is scaled by a constant. Thus, the problem with your computed Eb(k) is not that it's greater than 1, but that it gives you a log-slope of about .5 with k, whereas in the paper the slope is about 1.2, hence you will get the wrong epsilon.

My Algorithm

I started off by copying your code, looking over it, and re-implementing it in an equivalent way. My re-implementation replicated your results. I'm quite confident that you implemented the discrete version of the formula for E_b(k) correctly. However, a close examination of the paper suggests that the authors used smooth approximations in their code.

On the second page and column, the equality P(k|k') = P(k, k')/ (k')^(1-gamma) is stated. This is equivalent to replacing the exact probability P(k') in the denominator of the first integral with the smooth power-law approximation (k')^(-gamma) of the degree distribution, and is not an equality.

The fact that the authors state this approximation as an equality without qualification suggests to me that they may have used it as such in their code. So, I decided to use their approximation in code, resulting in the below (where I got gamma = 2.8 for cond-mat is explained below).

def ebkss(g, b, gamma=2.8):     edge_dict = defaultdict(lambda: defaultdict(int))     degree_dict = defaultdict(int)     edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]     for e in edge_degree:         edge_dict[e[0]][e[-1]] +=1     for i in g.degree().values():         degree_dict[i] +=1     edge_number = g.number_of_edges()     node_number = g.number_of_nodes()     ebks, ks = [], []     for k1 in edge_dict:         p1, p2 = 0, 0         nk2k = np.sum(edge_dict[k1].values())         pk1 = float(degree_dict[k1])/node_number         k1pk1 = k1*pk1          for k2 in edge_dict[k1]:             if k2 >= b*k1:                 pk2k = float(edge_dict[k1][k2])/edge_number                 pk2 = float(degree_dict[k2])/node_number                 p1 += pk2k/(k2*k2**(-gamma))         for k in degree_dict:             if k>=b*k1:                 pk = float(degree_dict[k])/node_number                 p2 += pk         if p2 > 0 and p1 > 0:             ebks.append(p1/p2)             ks.append(k1)     return ebks, ks 

The Results

Using this code:

def get_logslope(x,y):     A = np.empty((len(x), 2))     A[:,0] = np.log(x)     A[:,1] = 1     res = la.lstsq(A, np.log(y))     return res[0]  def show_eb(ca, b, gamma):     #calculate ebk      ebk, k = ebkss(ca, b=b,gamma=gamma)     print "Slope = ", get_logslope(np.array(k), np.array(ebk) )     plt.plot(k,ebk,'r^')     plt.xlabel(r'$k$', fontsize = 16)     plt.ylabel(r'$E_b(k)$', fontsize = 16)     plt.xscale('log')     plt.yscale('log')     plt.show() show_eb(ca, 3, 2.8) 

I got this output:

Slope =  1.22136715547 

Plot of Eb(k) for the cond-mat network

The slope (up to 1 digit after the decimal point, which is all that is given in the paper) is correct, and hence epsilon can now be computed correctly.

About Gamma

I got the value of gamma = 2.8 from adding the slope of 1.2 to the epsilon value of 1.6 (this follows from equation 4 of the paper). I also did a quick sanity check using the powerlaw Python module to determine if this gamma was a decent fit.

import powerlaw res = powerlaw.Fit(np.array(ca.degree().values())+1, xmin=10) print res.alpha 

This output

2.84571139756 

thus 2.8 is correct for the value of gamma up to rounding.

Edit with WWW data

I tested out my method with the WWW dataset. I ended up getting a slope that was close to the one in the paper, but the scaling is still off. Here's my code:

def log_binning(x, y, bin_count=50):     max_x = np.log10(max(x))     max_y = np.log10(max(y))     max_base = max([max_x,max_y])     xx = [i for i in x if i>0]     min_x = np.log10(np.min(xx))     bins = np.logspace(min_x,max_base,num=bin_count)     hist = np.histogram(x,bins)[0]     nonzero_mask = np.logical_not(hist==0)            hist[hist==0] = 1     bin_means_y = (np.histogram(x,bins,weights=y)[0] / hist)     bin_means_x = (np.histogram(x,bins,weights=x)[0] / hist)     return bin_means_x[nonzero_mask],bin_means_y[nonzero_mask] def single_line_read(fname):         g = nx.Graph()     with open(fname, "r") as f:         for line in f:           a = map(int,line.strip().split(" "))           g.add_edge(a[0], a[1])     return g  www = single_line_read("data/www.dat") ebk, k = ebkss(www, 3, 2.6) lk, lebk = log_binning(np.array(k,dtype=np.float64), np.array(ebk), bin_count=70) #print lk, lebk print "Slope", get_logslope(lk, lebk) plt.plot(lk,lebk/www.number_of_edges(),'r^') plt.xlabel(r'$k$', fontsize = 16) plt.ylabel(r'$E_b(k)$', fontsize = 16) plt.xscale('log') plt.yscale('log') plt.show() 

Slope 0.162453554297 WWW data

The slope from the original paper is 0.15. I got the gamma value of 2.6 by looking at Figure 3 in the paper (the gamma-epsilon chart).

In Conclusion

I'm not sure why Eb(k) is so much smaller than 1 in the paper's graphic. I'm pretty sure some rescaling is going on that's not explicit in the paper. However, I was able to recover the correct value of epsilon using Eb(k). As long as you're able to compute epsilon correctly, I wouldn't worry too much about it.

like image 126
bpachev Avatar answered Oct 25 '22 02:10

bpachev