i am surprised that:
import igraph
import random, time
start_time = time.time()
G = igraph.Graph(directed = True)
G.add_vertices(10000)
for i in range(30000):
G.add_edge(random.randint(0,9999), random.randint(0,9999))
print "done in " + str(int(time.time() - start_time)) + " seconds"
returns done in 63 seconds
while
import igraph
import random, time
start_time = time.time()
G = igraph.Graph(directed = True)
G.add_vertices(10000)
edges = []
for i in range(30000):
edges += [(random.randint(0,9999), random.randint(0,9999))]
G.add_edges(edges)
print "done in " + str(int(time.time() - start_time)) + " seconds"
returns done in 0 seconds. Does someone from the project know why?
The reason is that igraph uses an indexed edge list as its data structure in the C layer. The index makes it possible to query the neighbors of a specific vertex in constant time. This is good if your graph rarely changes, but it becomes a burden when the modification operations are far more frequent than the queries, since whenever you add or remove an edge, you have to update the index. So, every call to add_edge
will make igraph reindex its internal data structures. The upside is that if you have to rebuild the index anyway, you can just as well add many edges using add_edges
with approximately the same cost. So, in your first code example, you rebuild the index 30000 times, while in the second example, you rebuild the index only once.
By the way, what you are doing could be done even faster using Graph.Erdos_Renyi(n=10000, m=30000)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With