Specifically a Multigraph.
Some colleague suggested this and I'm completely baffled.
Any insights on this?
You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once.
relational database. The most notable difference between the two is that graph databases store the relationships between data as data. Relational databases infer a focus on relationships between data but in a different way.
Graph Databases will replace RDBMS technologies.
It's pretty straightforward to store a graph in a database: you have a table for nodes, and a table for edges, which acts as a many-to-many relationship table between the nodes table and itself. Like this:
create table node (
id integer primary key
);
create table edge (
start_id integer references node,
end_id integer references node,
primary key (start_id, end_id)
);
However, there are a couple of sticky points about storing a graph this way.
Firstly, the edges in this scheme are naturally directed - the start and end are distinct. If your edges are undirected, then you will either have to be careful in writing queries, or store two entries in the table for each edge, one in either direction (and then be careful writing queries!). If you store a single edge, i would suggest normalising the stored form - perhaps always consider the node with the lowest ID to be the start (and add a check constraint to the table to enforce this). You could have a genuinely unordered representation by not having the edges refer to the nodes, but rather having a join table between them, but that doesn't seem like a great idea to me.
Secondly, the schema above has no way to represent a multigraph. You can extend it easily enough to do so; if edges between a given pair of nodes are indistinguishable, the simplest thing would be to add a count to each edge row, saying how many edges there are between the referred-to nodes. If they are distinguishable, then you will need to add something to the node table to allow them to be distinguished - an autogenerated edge ID might be the simplest thing.
However, even having sorted out the storage, you have the problem of working with the graph. If you want to do all of your processing on objects in memory, and the database is purely for storage, then no problem. But if you want to do queries on the graph in the database, then you'll have to figure out how to do them in SQL, which doesn't have any inbuilt support for graphs, and whose basic operations aren't easily adapted to work with graphs. It can be done, especially if you have a database with recursive SQL support (PostgreSQL, Firebird, some of the proprietary databases), but it takes some thought. If you want to do this, my suggestion would be to post further questions about the specific queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With