Graph Databases vs Triple Stores - when to use which?

Tags:

I know that there are similar questions around on Stackoverflow but I don't feel they answer the following.

Graph Databases to my understanding store data following mostly this schema:

Table/Collection 1: store nodes with UID Table/Collection 2: store relations referencing nodes via UID

This allows storing arbitrary types of graphs. Now as I understand triple stores store nothing but triples:

Triple/Collection 1: store triples (2 nodes, 1 relation)

Now I would see the following distinction regarding use cases:

Graph Databases: when you have known, static connections
Triple Stores: when you have loosely connected nodes and are often looking for new connections

I am confused by the fact that people do not seem to be discussing which one to use according to these criteria. Most article I find are talking about arguments like speed or compatibility. But is this not the most relevant point?

Put the other way round:

Imagine having a clearly connected, user defined graph. Why on earth would you want to store that as triples only, loosing all the info about connections? Or having to implement some custom solution storing IDs in the triple subject.
Imagine having loosely collected nodes that you want to query for unknown relations using SPARQL. Graph databases do support that. But for this they have to build another index I assume and would be slower?

EDIT: I see that "loosing info about connections" is the wrong way to put it. If you do as shown in the accepted answer and insert several triples for 2 nodes + 1 relation then you keep all the info and specifically the info what exact nodes are connected.

495

asked May 11 '15 11:05

B M

1 Answers

The main difference between graph databases and triple stores is how they model the graph. In a triple store (or quad store), the data tends to be very atomic. What I mean is that the "nodes" in the graph tend to be primitive data types like string, integer, date, etc. Relationships link primitives together, and so the "unit of discourse" in a triple store is a triple, and not a node or a relationship, typically.

By contrast, other graph databases are often called "property stores" because nodes are data containers that correspond to objects in a domain. A node stands in for an object, and has properties; they act as rich data types specified by the graph modelers, more than just primitive data types. In these graph databases, nodes and relationships are the "unit of discourse".

Let's say I have a person named "Bob" who knows "Susan". In RDF, it would be something like this:

<http://example.org/person/1> :hasName "Bob". <http://example.org/person/1> foaf:knows <http://example.org/person/2>. <http://example.org/person/2> :hasName "Susan".

In a graph database like neo4j, it would be this:

(a:Person {name: "Bob"})-[:KNOWS]->(b:Person {name: "Susan"})

Notice that in RDF, it's 3 relationships but only one of those relationships actually expresses semantics between two entities. The other two relationships are just tracking properties of a single higher-level entity (the person). In neo4j, it's 1 relationship amongst two nodes, with each node having a property. In RDF you'll tend to identify things by URI, in neo4j it's a database object that gets a database ID automatically. That's what I mean about the difference between a more atomic/primitive store (triple stores) and a richer property graph.

RDF and triple stores are mostly built for the kinds of architectural challenges you'd run into with the semantic web. For example, XML namespacing is built in, on the architectural assumption that you'll be mixing and matching the use of many different vocabularies and namespaces. (That right there is a very "semantic web" assumption). So in SPARQL and RDF you'll see typically at least the use of xsd, rdf, and rdfs namespaces concurrently, and probably also owl, skos, and many others. SPARQL and RDF/RDFS also have many hooks and features that are there explicitly to make things like ontology inference easier. You'll tend to identify things with URIs as a way of "namespacing your identifiers" but also because some people may want to de-reference the URI...again the assumption here is a wide data sharing arrangement between many parties.

Property stores by contrast are keyed towards different use cases, like flexible modeling of data within one model/namespace, mappings between objects and graphs for persistence of enterprise applications, rapid evolvability, and so on. You'll tend to identify things with your own scheme (or an internal database ID). An auto-incrementing integer may not be best form of ID for any random consumer on the web, (and they certainly can't be de-referenced like URLs) but they might not be your first thought for a company internal application.

So which is better? The more atomic triple store format, or a rich property graph? Do you need to mix and match many different vocabularies in one query or data model? Do you need to create an OWL ontology or do inference? Do you need to serialize a bunch of java objects in memory to a database? Do you need to do fast traversal of long paths? Those types of questions would guide your selection.

Graphs are graphs, both of them do graphs, and so I don't think there's much difference in terms of what they can represent, or how you go about thinking about a problem in "graph terms". The differences boil down to the architecture underneath of the hood, and what sorts of use cases you think you'll need. I won't tell you one is better than the other, but choose wisely.

143

answered Oct 13 '22 02:10

FrobberOfBits

Related questions
                            
                                Graph Database in Java (other than Neo4J)
                            
                                Neo4j - Is there a cypher query syntax to list (show) all indexes in DB?
                            
                                What is the difference between multiple MATCH clauses and a comma in a Cypher query?
                            
                                Why is Neo4J telling me there is no spoon?
                            
                                Show all Nodes and Relationships in Data Browser Tab
                            
                                Add label to existing node with Cypher
                            
                                Experiences OrientDB vs Neo4j [closed]
                            
                                Adding relationship to existing nodes with Cypher
                            
                                MongoDB + Neo4J vs OrientDB vs ArangoDB [closed]
                            
                                Creating multiple databases on one server using Neo4j
                            
                                Neo4j: Get all nodes in a graph, even those that are unconnected by relationships
                            
                                What is the difference between graph-based databases and object-oriented databases?
                            
                                neo4j - labels vs properties vs relationship + node
                            
                                Neo4j WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual
                            
                                Is it a good idea to use MySQL and Neo4j together?
                            
                                Neo4j: Match multiple labels (2 or more)
                            
                                Node identifiers in neo4j
                            
                                LIKE clause in CYPHER Query
                            
                                neo4j how to return all node labels with Cypher?
                            
                                anybody tried neo4j vs titan - pros and cons [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Graph Databases vs Triple Stores - when to use which?

Tags:

graph-databases

orientdb

neo4j

sparql

triplestore

B M

People also ask

1 Answers

FrobberOfBits

Recent Activity

Donate For Us