To follow up on my question on modeling relational data with nosql, I have read several articles on the subject:
Nosql doesn't mean non-relational
Nosql Ecommerce Example
They seem to suggest that nosql can handle normalized, relational data.
So let's continue with the example I had before, a CMS system that have two types of data: article and authors, where article has an reference (by ID) to author.
Below are the operations the system needs to support:
I would like to understand the performance of these operation when compare to the same operation if the same data were stored on RDBMS. In particular, please specify if the operation uses MapReduce, require multple trips to the nosql store (Links), or pre-join
I would like to limit to discussion to document-based nosql solution like mongodb, couchdb, and riak.
Edit 1:
Spring-data project is avalible on Riak and Mongodb
Just wanted to toss in a CouchDB answer for anyone who might be curious. :)
As mentioned in the first answer above, embedding the author document into the article document is unwise, so the examples below assume two document types: articles and authors.
CouchDB uses MapReduce queries typically written in JavaScript (but Python, Ruby, Erlang and others are availble). The results of a MapReduce query are stored in an index upon their first request and that stored index is used to for all future look-ups. Changes to the database are added to the index upon further requests.
CouchDB's API is completely HTTP-based, so all requests to the database are HTTP verbs (GET, POST, PUT, DELETE) at various URLs. I'll be listing both the MapReduce queries (written in JavaScript) along with the URL used to request related results from the index.
1. Fetch a article by id along with the author
The simplest method for doing this is two direct document lookups:
GET /db/{article_id} GET /db/{author_id}
...where {author_id} is the value obtained from the article's author_id field.
2. Fetch all articles by particular author
MapReduce
function (doc) {
if (doc.type === 'article') {
emit(doc.author_id, doc);
}
}
GET /db/_design/cms/_view/articles_by_author?key="{author_id}"
...where {author_id} is the actual ID of the author.
3. Find the first 10 article(s) with the author(s) sorted by creation date
MapReduce
function (doc) {
function arrayDateFromTimeStamp(ts) {
var d = new Date(ts);
return [d.getFullYear(), d.getMonth(), d.getDate(), d.getHours(), d.getMinutes(), d.getSeconds()];
}
var newdoc = doc;
newdoc._id = doc.author_id;
newdoc.created_at = arrayDateFromTimeStamp(doc.created_at);
if (doc.type === 'article') {
emit(newdoc.created_at, newdoc);
}
}
It's possible to do include style "joins" in CouchDB using ?include_docs=true
in a view request. If you include a "_id" key in the value side of the emit (the second argument), then adding include_docs=true
to your query parameters will include the doc referenced by the specified "_id" In the case above, we're replacing the document's own "_id" (which we don't need anymore) with the referenced author's "_id" (the value of "author_id" in the article document). Requesting the top 10 articles with their related author info looks like this:
GET /db/_design/cms/_view/articles_by_date?descending=true&limit=10&include_docs=true
Requesting that URL will return a list of the most recent 10 articles in a format similar to:
{"rows":[ { "id":"article_id", "key":[2011, 9, 3, 12, 5, 41], "value":{"_id":"author_id", "title":"..."}, "doc":{"_id":"author_id", "name":"Author Name"} } ]}
Using this same index you can get a list of all the documents any any year, month, day, hour, etc granularity with or without author data.
There are also methods for using view collation to aggregate several documents together out of a single document (like a page in a CMS referencing disparate content). There's some info on how to do that in these slides I did for CouchConf in July: http://www.slideshare.net/Couchbase/couchconfsfdesigningcouchbasedocuments
If you have any other questions, please let me know.
Fetch a article by id along with the author
SQL:
MongoDB:
Fetch all articles by particular author
SQL:
MongoDB:
Find the first 10 article(s) with the author(s) sorted by creation date
SQL:
MongoDB:
articles.find().sort().limit(10)
, authors.find({$in:[article_authors]})
Summary
In two cases MongoDB requires an extra query, but does most of the same total work underneath. In some cases MongoDB returns less data over the network (no repeated entries). The join queries tend to be limited by the requirement that all the data to join live on the same box. If Authors and Articles live in different places, then you end up doing two queries anyways.
MongoDB tends to get better "raw" performance because it doesn't flush to disk with every write (so it's actually a "durability" tradeoff). It also has a much smaller query parser, so there's less activity per query.
From a basic performance standpoint these things are very similar. They just make different assumptions about your data and the trade-offs you want to make.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With