I've considered creating a Vertices table and an Edges table but would building graphs in memory and traversing sub-graphs require a large number of lookups? I'd like to avoid excessive database reads. Is there any other way of persisting a graph?
Side note: I've heard of Neo4j but my question is really how to conceptually represent a graph in a standard database. I am open to some NoSQL solutions like mongodb though.
There are three ways to store a graph in memory: Nodes as objects and edges as pointers. A matrix containing all edge weights between numbered node x and node y. A list of edges between numbered nodes.
One way to structure data is to store it in tabular format (rows and columns), such as in spreadsheets or todo lists. Storing data in a structured way, such as in a table or a spreadsheet, allows us to find the data easily and also to manage it better.
The relational focus is between the columns of data tables, not data points. Both databases make adding new data easy. The flexibility of a graph database enables the ability to add new nodes and relationships between nodes, making it reliable for real-time data.
Vectors. It's the most common method for saving graph. For each vertex keep a vector of it's edges, now for each edge just save it in related vectors.
The answer is unfortunately: Your consideration is completely right in every point. You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once. You have to traverse from Node to Edge to Node to Edge to Node...and so on (Recursively, while SQL is working with Sets).
The point is...
Relational, Graph oriented, Object oriented, Document based are different types of data structures that meet different requirements. Thats what its all about and why so many different NoSQL Databases (most of them are simple document stores) came up, because it simply makes no sense to organize big data in a relational way.
Alternative 1 - Graph oriented database
But there are also graph oriented NoSQL databases, which make the graph data model a first class citizen like OrientDB which I am playing around with a little bit at the moment. The nice thing about it is, that although it persists data as a graph, it still can be used in a relational or even object oriented or document oriented way also (i.e. by querying with plain old SQL). Nevertheless Traversing the graph is the optimal way to get data out of it for sure.
Alternative 2 - working with graphs in memory
When it comes to fast routing, routing frameworks like Graphhopper build up the complete Graph (Billions of Nodes) inside memory. Because Graphhopper uses a MemoryMapped Implementation of its GraphStore, that even works on Android Devices with only some MB of Memory need. The complete graph is read from database into memor at startup, and routing is then done there, so you have no need to lookup the database.
I faced this same issue and decided to finally go with the following structure, which requires 2 database queries, then the rest of the work is in memory:
Store nodes in a table and reference the graph with each node record:
Table Nodes id | title | graph_id --------------------- 105 | node1 | 2 106 | node2 | 2
Also store edges in another table and again reference the graph these edges belong to with each edge:
Table Edges id | from_node_id | to_node_id | graph_id ----------------------------------------- 1 | 105 | 106 | 2 2 | 106 | 105 | 2
Get all the nodes with one query, then get all the edges with another.
Now build your preferred way to store the graph (e.g., adjacency list) and proceed with your application flow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With