Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB storing user-specific data on shared collection objects

I'm designing an application that processes RSS feeds using MongoDB. Currently my collections are as follows:

Entry
fields: content, feed_id, title, publish_date, url

Feed
fields: description, title, url

User
fields: email_address
subscriptions (embedded collection; fields: feed_id, tags)

A user can subscribe to feeds which are linked from the embedded subscription collection. From the subscriptions I can get a list of all the feeds a user should see and also the corresponding entries.

How should I store entry status information (isRead, isStarred, etc.) that is specific to a user? When a user views an entry I need to record isRead = 1. Two common queries I need to be able to perform are:

  • Find all entries for a specific feed where isRead = 0 or no status exists currently
  • For a specific user, mark all entries prior to a publish date with isRead = 1 (this could be hundreds or even thousands of records so it must be efficient)
like image 340
Josh Rickard Avatar asked Mar 18 '26 01:03

Josh Rickard


1 Answers

Hmm, this is a tricky one!

It makes sense to me to store a record for entries that are unread, and delete them when they're read. I'm basing this on the assumption that there will be more read posts than unread for each individual user, so you might as well not have documents for all of those already-read entries sitting around in your DB forever. It also makes it easier to not have to worry about the 16MB document size limit if you're not having to drag around years of history with you everywhere.

For starred entries, I would simply add an array of Entry ObjectIds to User. No need to make these subscription-specific; it'll be much easier to pull a list of items a User has starred that way.

For unread entries, it's a little more complex. I'd still add it as an array, but to satisfy your requirement of being able to quickly mark as-read entries before a specific date, I would denormalize and save the publish-date alongside the Entry ObjectId, in a new 'UnreadEntry' document.

User
fields: email_address, starred_entries[]
subscriptions (embedded collection; fields: feed_id, tags, unread_entries[])

UnreadEntry
fields: id is Entry ObjectId, publish_date

You need to be conscious of the document limit, but 16MB is one hell of a lot of unread entries/feeds, so be realistic about whether that's a limit you really need to worry about. (If it is, it should be fairly straightforward to break out User.subscriptions to its own document.)

Both of your queries now become fairly easy to write:

All entries for a specific feed that are unread: user.subscriptions.find(feedID).unread_entries

Mark all entries prior to a publish date read: user.subscriptions.find(feedID).unread_entries.where(publish_date.lte => my_date).delete_all

And, of course, if you simply need to mark all entries in a feed as read, that's very easy: user.subscriptions.find(feedID).unread_entries.delete_all

like image 198
tkrajcar Avatar answered Mar 20 '26 15:03

tkrajcar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!