Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the best ways to store Graphs in persistent storage

I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.

I see neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).

What about options like Cassandra, or MySQL?

like image 991
nicoslepicos Avatar asked Jun 04 '10 05:06

nicoslepicos


People also ask

How do you store a graph in a relational database?

You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once.

What is Native graph storage?

There are two main elements that distinguish native graph technology: storage and processing. Graph storage commonly refers to the underlying structure of the database that contains graph data. When built specifically for storing graph-like data, it is known as native graph storage.

How does neo4j store data?

Properties are stored as a linked list of property records, each holding a key and value and pointing to the next property. Each node and relationship references its first property record. The Nodes also reference the first relationship in its relationship chain. Each Relationship references its start and end node.


2 Answers

Graph Databases:

  1. HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
  2. InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
  3. vertexdb: a high performance graph database server that supports automatic garbage collection.

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

  1. WebGraph is a framework to study the web graph. From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
  2. Dex is a high performance library to manage very large graphs or networks.
  3. This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the technique they use is "memory-mapped I/O, disk-based linear-hashing".
like image 149
Susheel Javadi Avatar answered Oct 02 '22 02:10

Susheel Javadi


Disclaimer: I am speaking form the graph analysis standpoint.

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

like image 31
Viesturs Avatar answered Oct 02 '22 01:10

Viesturs