How would one go about storing and querying sparse directed or undirected graphs in Postgresql. There is something like pggraph, but that is still in planning stage.
I realize dedicated graph databases like Neo4J are best suited for this. However is there way to implement same within Postgresql, by using extension or a data type, which would avoid adding another database engine.dtata
Apache AGE is a PostgreSQL extension that provides graph database functionality. The goal of Apache AGE is to enable users of Postgres to use graph query modeling in unison with Postgres' existing relational model.
You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once.
Here, we collected some of the best free and open source graph databases. Neo4j is one of the most popular highly scalable and native graph database (written in Java) designed to leverage data relationships.
A graph database will always have a rudimentary schema consisting of (at least) Vertex and Edge objects, where an Edge can contain data about a particular relationship. The degree to which you can add to this schema varies widely across implementations.
In essence, there are some techniques to efficiently query graph data within an SQL database, that apply to highly specialized scenarios.
You could opt to maintain a GRIPP index, for instance, if your interests lie in finding shortest paths. (It basically works a bit like pre-ordered tree index, applied to graphs.) To the best of my knowledge, none of these techniques are standardized yet.
With that being said, and seeing your comment that mentions social networks, the odds are that each of them will be overkill.
If your interest primarily lies in fetching data related to a user's friends, or something equivalent in the sense that it amounts to querying a node's neighborhood, the number of nodes you'll need to traverse in joins is so tiny that there is no need for specialized tools, data structures, etc.: simply use recursive CTEs.
http://www.postgresql.org/docs/current/static/queries-with.html
WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query.
For optimal performance when using the latter, shift as many where
conditions within the with (...)
part of the query, so as to eliminate nodes early.
Use PostgreSQL for the underlying storage and use networkX or iGraph via PL/Python for the processing engine.
In their book Graph Databases, Ian Robinson, Jim Webber, and Emil Eifrem make a distinction between the underlying storage and the processing engine. If you look at the answer I followed in a recent problem (see here), you will see that I'm using PostgreSQL for the underlying storage and networkX as the processing engine. The performance gain relative to my original solution was huge (and similar to the ones described in the "Graph Databases" book) and implementing it was very easy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With