Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient graph data structure in Python? [closed]

I need to be able to manipulate a large (10^7 nodes) graph in python. The data corresponding to each node/edge is minimal, say, a small number of strings. What is the most efficient, in terms of memory and speed, way of doing this?

A dict of dicts is more flexible and simpler to implement, but I intuitively expect a list of lists to be faster. The list option would also require that I keep the data separate from the structure, while dicts would allow for something of the sort:

graph[I][J]["Property"]="value" 

What would you suggest?


Yes, I should have been a bit clearer on what I mean by efficiency. In this particular case I mean it in terms of random access retrieval.

Loading the data in to memory isn't a huge problem. That's done once and for all. The time consuming part is visiting the nodes so I can extract the information and measure the metrics I'm interested in.

I hadn't considered making each node a class (properties are the same for all nodes) but it seems like that would add an extra layer of overhead? I was hoping someone would have some direct experience with a similar case that they could share. After all, graphs are one of the most common abstractions in CS.

like image 433
bgoncalves Avatar asked Aug 04 '08 12:08

bgoncalves


People also ask

What is the most efficient data structure in Python?

Almost certainly, the first iterable data structure any Python programmer has come across is a list . It has a strong first impression of being super flexible, and suitable in almost any scenario in our daily coding routine.

Which Python's data structure is the best to work with graphs?

The data structure I've found to be most useful and efficient for graphs in Python is a dict of sets. This will be the underlying structure for our Graph class. You also have to know if these connections are arcs (directed, connect one way) or edges (undirected, connect both ways).

Which data structure would be most efficient?

Arrays. An array is a linear data structure that holds an ordered collection of values. It's the most efficient in storing and accessing a sequence of objects.


2 Answers

I would strongly advocate you look at NetworkX. It's a battle-tested war horse and the first tool most 'research' types reach for when they need to do analysis of network based data. I have manipulated graphs with 100s of thousands of edges without problem on a notebook. Its feature rich and very easy to use. You will find yourself focusing more on the problem at hand rather than the details in the underlying implementation.

Example of Erdős-Rényi random graph generation and analysis

 """ Create an G{n,m} random graph with n nodes and m edges and report some properties.  This graph is sometimes called the Erd##[m~Qs-Rényi graph but is different from G{n,p} or binomial_graph which is also sometimes called the Erd##[m~Qs-Rényi graph. """ __author__ = """Aric Hagberg ([email protected])""" __credits__ = """""" #    Copyright (C) 2004-2006 by  #    Aric Hagberg  #    Dan Schult  #    Pieter Swart  #    Distributed under the terms of the GNU Lesser General Public License #    http://www.gnu.org/copyleft/lesser.html  from networkx import * import sys  n=10 # 10 nodes m=20 # 20 edges  G=gnm_random_graph(n,m)  # some properties print "node degree clustering" for v in nodes(G):     print v,degree(G,v),clustering(G,v)  # print the adjacency list to terminal  write_adjlist(G,sys.stdout) 

Visualizations are also straightforward:

enter image description here

More visualization: http://jonschull.blogspot.com/2008/08/graph-visualization.html

like image 145
Ryan Cox Avatar answered Sep 22 '22 17:09

Ryan Cox


Even though this question is now quite old, I think it is worthwhile to mention my own python module for graph manipulation called graph-tool. It is very efficient, since the data structures and algorithms are implemented in C++, with template metaprograming, using the Boost Graph Library. Therefore its performance (both in memory usage and runtime) is comparable to a pure C++ library, and can be orders of magnitude better than typical python code, without sacrificing ease of use. I use it myself constantly to work with very large graphs.

like image 39
Tiago Peixoto Avatar answered Sep 19 '22 17:09

Tiago Peixoto