I'm working on simulations of large-scale neuronal networks, for which I need to generate random graphs that represent the network topology.
I'd like to be able to specify the following properties of these graphs:
Ideally, the random graphs should be drawn uniformly from the set of all possible graphs that satisfy these user-specified criteria.
At the moment I'm using a very crude random diffusion approach where I start out with an Erdos-Renyi random network with the desired size and global connection probability, then on each step I randomly rewire some fraction of the edges. If the rewiring got me closer to the desired C then I keep the rewired network into the next iteration.
Here's my current Python implementation:
import igraph
import numpy as np
def generate_fixed_gcc(n, p, target_gcc, tol=1E-3):
"""
Creates an Erdos-Renyi random graph of size n with a specified global
connection probability p, which is then iteratively rewired in order to
achieve a user- specified global clustering coefficient.
"""
# initialize random graph
G_best = igraph.Graph.Erdos_Renyi(n=n, p=p, directed=True, loops=False)
loss_best = 1.
n_edges = G_best.ecount()
# start with a high rewiring rate
rewiring_rate = n_edges
n_iter = 0
while loss_best > tol:
# operate on a copy of the current best graph
G = G_best.copy()
# adjust the number of connections to rewire according to the current
# best loss
n_rewire = min(max(int(rewiring_rate * loss_best), 1), n_edges)
G.rewire(n=n_rewire)
# compute the global clustering coefficient
gcc = G.transitivity_undirected()
loss = abs(gcc - target_gcc)
# did we improve?
if loss < loss_best:
# keep the new graph
G_best = G
loss_best = loss
gcc_best = gcc
# increase the rewiring rate
rewiring_rate *= 1.1
else:
# reduce the rewiring rate
rewiring_rate *= 0.9
n_iter += 1
# get adjacency matrix as a boolean numpy array
M = np.array(G_best.get_adjacency().data, dtype=np.bool)
return M, n_iter, gcc_best
This is works OK for small networks (N < 500), but it quickly becomes intractable as the number of nodes increases. It takes on the order of about 20 sec to generate a 200 node graph, and several days to generate a 1000 node graph.
Can anyone suggest an efficient way to do this?
You are right. That is a very expensive method to achieve what you want. I can only speculate if there is a mathematically sound way to optimize and ensure that it is close to being a uniform distribution. I'm not even sure that your method leads to a uniform distribution although it seems like it would. Let me try:
Based on the docs for transitivity_undirected
and wikipedia Clustering Coefficient, it sounds like it is possible to make changes locally in the graph and at the same time know the exact effect on global connectivity and global clustering.
The global clustering coefficient is based on triplets of nodes. A triplet consists of three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle consists of three closed triplets, one centred on each of the nodes. The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed).
( * edit * ) Based on my reading of the paper referenced by ali_m, the method below will probably spend too many edges on low-degree clusters, leading to a graph that cannot achieve the desired clustering coefficient unless it is very low (which probably wouldn't be useful anyway). Therefore, on the off chance that somebody actually uses this, you will want to identify higher degree clusters to add edges to in order to quickly raise the clustering coefficient without needing to add a lot of edges.
On the other hand, the method below does align with the methods in the research paper so it's more or less a reasonable approach.
If I understand it correctly, you could do the following:
Produce the graph as you have done.
Calculate and Track:
p_surplus
to track the number of edges that need to be added or removed elsewhere to maintain connectivitycc_top
, cc_btm
to track the clustering coefficientIteratively (not completely) choose random pairs and connect or disconnect them to monotonically approach the Clustering Coefficient (cc) you want while maintaining the Connectivity (p) you already have.
Pseudo code:
for random_pair in random_pairs:
if (random_pair is connected) and (need to reduce cc or p): # maybe put priority on the one that has a larger gap?
delete the edge
p_surplus -= 1
cc_top -= broken_connected_triplets # have to search locally
cc_btm -= (broken_connected_triplets + broken_open_triplets) # have to search locally
elif (random_pair is not connected) add (need to increase c or p):
add the edge
p_surplus += 1
cc_top += new_connected_triplets
cc_btm += (new_connected_triplets + new_open_triplets)
if cc and p are within desired ranges:
done
if some condition for detecting infinite loops:
rethink this method
That may not be totally correct, but I think the approach will work. The efficiency of searching for local triplets and always moving your parameters in the right direction will be better than copying the graph and globally measuring the cc so many times.
Having done a bit of reading, it looks as though the best solution might be the generalized version of Gleeson's algorithm presented in this paper. However, I still don't really understand how to implement it, so for the time being I've been working on Bansal et al's algorithm.
Like my naive approach, this is a Markov chain-based method that uses random edge swaps, but unlike mine it specifically targets 'triplet motifs' within the graph for rewiring:
Since this will have a greater tendency to introduce triangles, it will therefore have a greater impact on the clustering coefficient. At least in the case of undirected graphs, the rewiring step is also guaranteed to preserve the degree sequence. Again, on every rewiring iteration the new global clustering coefficient is measured, and the new graph is accepted if the GCC got closer to the target value.
Bansal et al actually provided a Python implementation, but for various reasons I ended up writing my own version, which you can find here.
The Bansal approach takes just over half the number of iterations and half the total time compared with my naive diffusion method:
I was hoping for bigger gains, but a 2x speedup is better than nothing.
One remaining challenge with the Bansal method is that my graphs are directed, whereas Bansal et al's algorithm is only designed to work on undirected graphs. With a directed graph, the rewiring step is no longer guaranteed to preserve the in- and out-degree sequences.
I've just figured out how to generalize the Bansal method to preserve both the in- and out-degree sequences for directed graphs. The trick is to select motifs where the two outward edges to be swapped have opposite directions (the directions of the edges between {x, y1} and {x, y2} don't matter):
I've also made some more optimizations, and the performance is starting to look a bit more respectable - it takes roughly half the number of iterations and half the total time compared with the diffusion approach. I've updated the graphs above with the new timings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With