Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you architect a blog using a document store (such as CouchDB, Redis, MongoDB, Riak, etc)

I'm slightly embarrassed to admit it, but I'm having trouble conceptualizing how to architect data in a non-relational world. Especially given that most document/KV stores have slightly different features.

I'd like to learn from a concrete example, but I haven't been able to find anyone discussing how you would architect, for example, a blog using CouchDB/Redis/MongoDB/Riak/etc.

There are a number of questions which I think are important:

  1. Which bits of data should be denormalised (e.g. tags probably live with the document, but what about users)
  2. How do you link between documents?
  3. What's the best way to create aggregate views, especially ones which require sorting (such as a blog index)
like image 570
sh1mmer Avatar asked Oct 14 '22 21:10

sh1mmer


2 Answers

First of all I think you would want to remove redis from the list as it is a key-value store instead of a document store. Riak is also a key-value store, but you it can be a document store with library like Ripple.

In brief, to model an application with document store is to figure out:

  1. What data you would store in its own document and have another document relate to it. If that document is going to be used by many other documents, then it would make sense to model it in its own document. You also must consider about querying the documents. If you are going to query it often, it might be a good idea to store it in its own document as you would find it hard to query over embedded document.
    • For example, assuming you have multiple Blog instance, a Blog and Article should be in its own document eventhough an Article may be embedded inside Blog document.
    • Another example is User and Role. It makes make sense to have a separate document for these. In my case I often query over user and it would be easier if it is separated as its own document.
  2. What data you would want to store (embed) inside another document. If that document only solely belongs to one document, then it 'might' be a good option to store it inside another document.

    • Comments sometimes would make more sense to be embedded inside another document

    { article : { comments : [{ content: 'yada yada', timestamp: '20/11/2010' }] } }

    Another caveat you would want to consider is how big the size of the embedded document will be because in mongodb, the maximum size of embedded document is 5MB.

  3. What data should be a plain Array. e.g:
    • Tags would make sense to be stored as an array. { article: { tags: ['news','bar'] } }
    • Or if you want to store multiple ids, i.e User with multiple roles { user: { role_ids: [1,2,3]}}

This is a brief overview about modelling with document store. Good luck.

like image 66
Joshua Partogi Avatar answered Oct 24 '22 22:10

Joshua Partogi


  1. Deciding which objects should be independent and which should be embedded as part of other objects is mostly a matter of balancing read/write performance/effort - If a child object is independent, updating it means changing only one document but when reading the parent object you have only ids and need additional queries to get the data. If the child object is embedded, all the data is right there when you read the parent document, but making a change requires finding all the documents that use that object.

  2. Linking between documents isn't much different from SQL - you store an ID which is used to find the appropriate record. The key difference is that instead of filtering the child table to find records by parent id, you have a list of child ids in the parent document. For many-many relationships you would have a list of ids on both sides rather than a table in the middle.

  3. Query capabilities vary a lot between platforms so there isn't a clear answer for how to approach this. However as a general rule you will usually be setting up views/indexes when the document is written rather than just storing the document and running ad-hoc queries later as you would with SQL.

like image 27
Tom Clarkson Avatar answered Oct 24 '22 23:10

Tom Clarkson