Suppose you have a large number of users (M) and a large number of documents (N) and you want each user to be able to mark each document as read or unread (just like any email system). What's the best way to represent this in MongoDB? Or any other document database?
There are several questions on StackOverflow asking this question for relational databases but I didn't see any with recommendations for document databases:
What's the most efficient way to remember read/unread status across multiple items?
Implementing an efficient system of "unread comments" counters
Typically the answers involve a table listing everything a user has read: (i.e. tuples of user id, document id) with some possible optimizations for a cut off date allowing mark-all-as-read to wipe the database and start again knowing that anything prior to that date is 'read'.
So, MongoDB / NOSQL experts, what approaches have you seen in practice to this problem and how did they perform?
{
_id: messagePrefs_uniqueId,
type: 'prefs',
timestamp: unix_timestamp
ownerId: receipientId,
messageId: messageId,
read: true / false,
}
{
_id: message_uniqueId,
timestamp: unix_timestamp
type: 'message',
contents: 'this is the message',
senderId: senderId,
recipients: [receipientId1,receipientId2]
}
Say you have 3 messages you want to retrieve preferences for, you can get them via something like:
db.messages.find({
messageId : { $in : [messageId1,messageId2,messageId3]},
ownerId: receipientId,
type:'prefs'
})
If all you need is read/unread you could use this with MongoDB's upsert capabilities, so you are not creating prefs for each message unless the user actually reads it, then basically you create the prefs object with your own unique id and upsert it into MongoDB. If you want more flexibility(like say tags or folders) you'll probably want to make the pref for each recipient of the message. For example you could add:
tags: ['inbox','tech stuff']
to the prefs object and then to get all the prefs of all the messages tagged with 'tech stuff' you'd go something like:
db.messages.find({type: 'prefs', ownerId: recipientId, tags: 'tech stuff'})
You could then use the messageIds you find within the prefs to query and find all the messages that correspond:
db.messages.find((type:'message', _id: { $in : [array of messageIds from prefs]}})
It might be a little tricky if you want to do something like counting how many messages each 'tag' contains efficiently. If it's only a handful of tags you can just add .count()
to the end of your query for each query. If it's hundreds or thousands then you might do better with a map/reduce server side script or maybe an object that keeps track of message counts per tag per user.
If you're only storing a simple boolean value, like read/unread, another method is to embedded an array in each Document that contains a list of the Users who have read it.
{
_id: 'document#42',
...
read_by: ['user#83', 'user#2702']
}
You should then be able to index that field, making for fast queries for Documents-read-by-User and Users-who-read-Document.
db.documents.find({read_by: 'user#83'})
db.documents.find({_id: 'document#42}, {read_by: 1})
However, I find that I'm usually querying for all Documents that have not been read by a particular User, and I can't think of any solution that can make use of the index in this case. I suspect it's not possible to make this fast without having both read_by
and unread_by
arrays, so that every User is included in every Document (or join table), but that would have a large storage cost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With