Graph DBs vs. Document DBs vs. Triplestores

Tags:

This is a somewhat abstract and general question. I'm interested in the inherent (as well as implementation-specific) properties of different approaches to persist unstructured data with both lots of internal references (graph-like) and lots of properties (JSON-like).

Since a graph is a superset of a tree, you can look at graph DBs (e.g. Neo4j) as a superset of document DBs (e.g. MongoDB). That is, a graph DB provides all the functionality of a document DB plus additionally also allows loops or has a native pointer type so you don't have to dereference foreign-keys/ids manually. So is there some tipping point that you reach when adding more references to your objects/resources where you're better off with a graph DB but were previously better off with a document store? Are there advantages to document DBs (storage space, performance?) or should you just always go with a graph DB just in case you'll need more references in the future?
Similarly, how do graph DBs and triplestores (e.g. RDF stores) compare? Graph DBs (where nodes and edges have properties) seem to be a superset of the simple triplestores. So for what problems (if any) perform triplestores actually better then, say Neo4j? (One advantage of RDF stores is that there is a standardized query language – SPARQL – although there seem to be a lot of people that don't like SPARQL and thus would call it a disadvantage.)

I guess my question is: The graph model (with properties) seems to be able to neatly express all kinds of data, what is the catch when you enter reality? I suppose the catch of graph DBs is performance, so I'd love to see some numbers or rules of thumb on what kind of slowdowns to expect when loading, querying and modifying data as well as memory, and persistent storage requirements (compared to document and triple stores). Also what about horizontal scalability? I got the impression that there the playing field is quite level.

Do you think it is possible that graphs with their expressibility will become the new default storage model for projects that have not super-large data, or are we doomed for a decade of Polyglot Persistence with RDBMS, JSON stores and Graph DBs living along each other that have to be integrated with even more glue code?

498

asked Aug 20 '12 18:08

mb21

1 Answers

I'm not sure I would agree with the sentiment that a lot of people don't like SPARQL. SPARQL 1.0 did have some short comings, but it quite nicely addressed what it was designed for, and the new iteration, SPARQL 1.1, builds upon it adding many constructs from SQL that people expected to see in the original spec including sub-queries, aggregates & update semantics. I think the fact that it's standard and you can expect to see the same parsing & semantics in every triple store, as opposed to dialects of SQL, is a nice feature.

I would also claim that all triple stores are graph databases; you can put properties on specific edges in RDF, albeit not as nicely as you can w/ Neo4j. But triple stores have the advantage of a real query language, a w3c standard data representation which makes it trivial to take your data to another triplestore, and for a number of triple stores, the ability to perform reasoning based on OWL.

I dont know anything about the scalability for most graph db's, but generally, the commercial RDF databases scale quite well. All can scale into the billions of triples, which handles a great many use cases. Though how they handle scale differs wildly from vendor to vendor wrt to scale up or scale out, clustering, etc. You'll also see pretty different mem & hardware requirements to match the implementations for each. For me, I've tended to just go and grab an EC2 instance, usually a 2XL or 4XL, mount an EBS large enough to hold the data, and I'm pretty well set.

Additionally, some triple stores integrate with Lucene or similar technologies to provide inverted indexes over the data, and many now are starting to include geo-spatial and temporal indexes. These are very useful features that I'm not sure of their availability in something like Neo4j.

With that said, they're not going to scale as well as a relational databases, they're just not as mature. But you're also not going to get screwed when you have "real" amounts of data either. Of course, one of the advantages of triples stores is reasoning, which performing at scale is tricky, but that's much of the reason why the various OWL profiles were created. But you can paint yourself into a corner if you don't think ahead.

I think graph databases, triple stores specifically, can be a pretty good match for a lot of applications that are being built, but I dont think that means that everything should be done with them. Like anything else, they're tools w/ their good points and their bad points, so you kind of have to make the right choice based on your application. But they probably always merit at least a consideration these days.

107

answered Sep 21 '22 06:09

Michael

Related questions
                            
                                Bulk upsert in MongoDB using mongoose
                            
                                How To Configure MongoDb Collection Name For a Class in Spring Data
                            
                                Optimal plugins and project to use IntelliJ IDEA for JavaScript? [closed]
                            
                                Which database out of CouchDB, MongoDB and Redis is good for starting out with Node.js?
                            
                                Fuzzy Searching with Mongodb?
                            
                                Updating embedded document property in Mongodb
                            
                                How create a Date field with default value as the current timestamp in MongoDb?
                            
                                Multiply field by value in Mongodb
                            
                                How to query MongoDB directly from Ruby instead of using Mongoid?
                            
                                How to use decimal type in MongoDB
                            
                                Unique Constraint with Two Fields in MongoDB
                            
                                Update in forEach on mongodb shell
                            
                                MongoDB Embedded Objects have no ID (null value)
                            
                                .updateOne on MongoDB not working in Node.js
                            
                                How to apply constraints in MongoDB?
                            
                                mongodb: how can I see the execution time for the aggregate command?
                            
                                pymongo auth failed in python script
                            
                                Unable to connect to mongolab, Getting MongoError: auth failed
                            
                                Creating methods to update & save documents with mongoose?
                            
                                Rails, Mongoid & Unicorn config for Heroku

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Graph DBs vs. Document DBs vs. Triplestores

Tags:

graph

mongodb

nosql

rdf

neo4j

mb21

People also ask

1 Answers

Michael

Recent Activity

Donate For Us