I know that there are similar questions around on Stackoverflow but I don't feel they answer the following.
Graph Databases to my understanding store data following mostly this schema:
Table/Collection 1: store nodes with UID Table/Collection 2: store relations referencing nodes via UID
This allows storing arbitrary types of graphs. Now as I understand triple stores store nothing but triples:
Triple/Collection 1: store triples (2 nodes, 1 relation)
Now I would see the following distinction regarding use cases:
I am confused by the fact that people do not seem to be discussing which one to use according to these criteria. Most article I find are talking about arguments like speed or compatibility. But is this not the most relevant point?
Put the other way round:
subject
.EDIT: I see that "loosing info about connections" is the wrong way to put it. If you do as shown in the accepted answer and insert several triples for 2 nodes + 1 relation then you keep all the info and specifically the info what exact nodes are connected.
Graph databases have advantages for use cases such as social networking, recommendation engines, and fraud detection, when you need to create relationships between data and quickly query these relationships. The following graph shows an example of a social network graph.
The RDF triplestore is a type of graph database that stores semantic facts. RDF, which stands for Resource Description Framework, is a model for data publishing and interchange on the Web standardized by W3C.
Since the Neo4j graph database is not a triple store, it is not equipped with a SPARQL query engine.
The main difference between graph databases and triple stores is how they model the graph. In a triple store (or quad store), the data tends to be very atomic. What I mean is that the "nodes" in the graph tend to be primitive data types like string, integer, date, etc. Relationships link primitives together, and so the "unit of discourse" in a triple store is a triple, and not a node or a relationship, typically.
By contrast, other graph databases are often called "property stores" because nodes are data containers that correspond to objects in a domain. A node stands in for an object, and has properties; they act as rich data types specified by the graph modelers, more than just primitive data types. In these graph databases, nodes and relationships are the "unit of discourse".
Let's say I have a person named "Bob" who knows "Susan". In RDF, it would be something like this:
<http://example.org/person/1> :hasName "Bob". <http://example.org/person/1> foaf:knows <http://example.org/person/2>. <http://example.org/person/2> :hasName "Susan".
In a graph database like neo4j, it would be this:
(a:Person {name: "Bob"})-[:KNOWS]->(b:Person {name: "Susan"})
Notice that in RDF, it's 3 relationships but only one of those relationships actually expresses semantics between two entities. The other two relationships are just tracking properties of a single higher-level entity (the person). In neo4j, it's 1 relationship amongst two nodes, with each node having a property. In RDF you'll tend to identify things by URI, in neo4j it's a database object that gets a database ID automatically. That's what I mean about the difference between a more atomic/primitive store (triple stores) and a richer property graph.
RDF and triple stores are mostly built for the kinds of architectural challenges you'd run into with the semantic web. For example, XML namespacing is built in, on the architectural assumption that you'll be mixing and matching the use of many different vocabularies and namespaces. (That right there is a very "semantic web" assumption). So in SPARQL and RDF you'll see typically at least the use of xsd
, rdf
, and rdfs
namespaces concurrently, and probably also owl
, skos
, and many others. SPARQL and RDF/RDFS also have many hooks and features that are there explicitly to make things like ontology inference easier. You'll tend to identify things with URIs as a way of "namespacing your identifiers" but also because some people may want to de-reference the URI...again the assumption here is a wide data sharing arrangement between many parties.
Property stores by contrast are keyed towards different use cases, like flexible modeling of data within one model/namespace, mappings between objects and graphs for persistence of enterprise applications, rapid evolvability, and so on. You'll tend to identify things with your own scheme (or an internal database ID). An auto-incrementing integer may not be best form of ID for any random consumer on the web, (and they certainly can't be de-referenced like URLs) but they might not be your first thought for a company internal application.
So which is better? The more atomic triple store format, or a rich property graph? Do you need to mix and match many different vocabularies in one query or data model? Do you need to create an OWL ontology or do inference? Do you need to serialize a bunch of java objects in memory to a database? Do you need to do fast traversal of long paths? Those types of questions would guide your selection.
Graphs are graphs, both of them do graphs, and so I don't think there's much difference in terms of what they can represent, or how you go about thinking about a problem in "graph terms". The differences boil down to the architecture underneath of the hood, and what sorts of use cases you think you'll need. I won't tell you one is better than the other, but choose wisely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With