Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph-tool surprisingly slow compared to Networkx

After looking at the impressive performance comparison, I decided that I would give a try to graph-tool. So for comparison, I wrote codes to generate a random tree using both packages.

The graph-tool code:

import numpy as np
import graph_tool.all as gt

# construct an initial graph with two nodes and one link
n = 5000
G = gt.Graph(directed = False)
G.add_edge(0, 1)

for t in range(2, n):
    # connect the new vertex to one of the old vertices randomly
    G.add_edge(np.random.choice(range(t)), t)

The Networkx code:

import networkx as nx
import numpy as np

n = 5000
# initial graph
G = nx.Graph()
G.add_edge(0, 1)

for t in range(2, n):
    G.add_edge(t, np.random.choice(range(t)))

The graph-tool takes around 14 seconds on my 4-core machine whereas networkx takes less than 2 seconds on the same machine! Am I missing something obvious?

Thanks in advance.

like image 940
Peaceful Avatar asked Mar 24 '16 05:03

Peaceful


People also ask

Why is NetworkX so slow?

NetworkX, on the other hand, comes at a distant third with running times in the order of 40 to 250 times slower than graph-tool. This is mostly due to its pure Python implementation, which is known to be in general substantially slower than C/C++ (see here and here for further comparisons).

Is Igraph faster than NetworkX?

On the pokec dataset it takes just 0.2s to run the page rank algorithm (graph-tool: 1.7s, igraph: 59.6s, snap: 19.5s). For the k-core decomposition it is also 10 times faster than all other competitors or 2000 times networkx.

Can NetworkX handle large graphs?

NX is certainly capable of handling graphs that large, however, performance will largely be a function of your hardware setup. Aric will likely give a better answer, but NX loads graphs into memory at once, so in the ranges your are describing you will need a substantial amount of free memory for it to work.

Is NetworkX pure Python?

NetworkX is the only library that is written in pure Python, so it is considerably slower than the rest.


1 Answers

There is nothing surprising here. graph-tool achieves greater performance by off-loading main loops to C++. If all your main loops are in Python, it offers no advantage. The same is true for other libraries like numpy.

The proper way to achieve fast addition of edges is to let graph-tool perform the main loop. The network you are generating is a simple growth model, and can be achieved in graph-tool by calling:

G = price_network(n, gamma=0, directed=False)

which takes about 15 ms in my computer for n=5000.

Note also that your python code is unnecessarily slow, since you create new lists with all vertices at each iteration. A much faster version would be:

from numpy.random import randint
n = 5000
G = Graph(directed=False)
G.add_vertex(n)
G.add_edge(0, 1)
for i in range(2, n):
    G.add_edge(i, randint(i))

For even larger values of n, it will be even faster to add all edges at once instead of one by one, i.e.

from graph_tool.all import *
from numpy.random import randint
n = 5000
G = Graph(directed=False)
edges = [(0, 1)]
for i in range(2, n):
    edges.append((i, randint(i)))
G.add_edge_list(edges)
like image 96
Tiago Peixoto Avatar answered Oct 21 '22 20:10

Tiago Peixoto