Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I implement an ordered array with mongodb without race-conditions?

I'm new to mongodb, maybe this is a trivial question. I have two mongodb collections: user and post. A user can create and follow multiple posts, and posts are listed sorted by last modification date. There may be a very large number of users following a specific post, so I don't want to keep the list of followers in each post document. On the other hand, one user will probably not follow more than a few thousand posts, so I decided to keep the list of followed posts' objectids in each user document.

In order to be able to quickly list the 50 most recently modified posts for a given user, I chose to keep the last_updated_at field along with the post objectid.

The post document is fairly basic:

{
    "_id" : ObjectId("5163deebe4d809d55d27e847"),
    "title" : "All about music"
    "comments": [...]
    ...
}

The user document looks like this:

{
  "_id": ObjectId("5163deebe4d809d55d27e846"),
  "posts": [{
    "post": ObjectId("5163deebe4d809d55d27e847"),
    "last_updated_at": ISODate("2013-04-09T11:27:07.184Z")
  }, {
    "post": ObjectId("5163deebe4d809d55d27e847"),
    "last_updated_at": ISODate("2013-04-09T11:27:07.187Z")
  }]
  ...
}

When a user creates or follows a post, I can simply $push the post's ObjectId and last_updated_at to the end of the posts list in the user's document. When a post is modified (for example when a comment is added to the post), I update the last_updated_at field for that post in all the follower's user documents. That's pretty heavy, but I don't know how to avoid it.

When I want to get the list of 50 most recently updated posts for a user, I unfortunately need to get the whole list of followed posts, then sort by last_updated_at in memory, then keep only the first 50 posts.

So I tried to change the implementation to reorder the list when a post is modified: I $push it to the end of the list, and $pull it from wherever it is. Since this is a two step procedure, there's a race condition where I might get twice the same post in the list. Is there no better way to maintain a sorted array in mongodb?

like image 658
MiniQuark Avatar asked Apr 09 '13 10:04

MiniQuark


1 Answers

Data model adjustment

Since you may have frequent updates to the latest posts for a given user, you probably want to avoid the overhead of rewriting data unnecessarily to maintain a sorted array.

A better approach to consider would be to flatten the data model and use a separate collection instead of an ordered array:

  • create a separate collection with the updated post stream: (userID, postID, lastUpdated)
  • when a post is updated, you can then do a simple update() with the multi:true and upsert:true options and $set the last_updated_at to the new value.
  • to retrieve the last 50 updated posts for a given userID you can do a normal find() with sort and limit options.
  • to automatically clean up the "old" documents you could even set a TTL expiry for this collection so the updates are removed from the activity stream after a certain number of days

Pushing to fixed-size & sorted arrays in MongoDB 2.4

If you do want to maintain ordered arrays, MongoDB 2.4 added two helpful features related to this use case:

  • Ability to push to fixed-sized arrays
  • Ability to push to arrays sorted by embedded document fields

So you can achieve your outcome of pushing to a fixed-sized array of 50 items sorted by last updated date descending:

db.user.update(
    // Criteria
    { _id: ObjectId("5163deebe4d809d55d27e846") },

    // Update
    { $push: {
        posts: {
            // Push one or more updates onto the posts array
            $each: [
                {
                    "post": ObjectId("5163deebe4d809d55d27e847"),
                    "last_updated_at": ISODate()
                }
            ],

            // Slice to max of 50 items
            $slice:-50,

            // Sorted by last_updated_at desc
            $sort: {'last_updated_at': -1}
        }
    }}
)

The $push will update the list in sorted order, with the $slice trimming the list to the first 50 items. Since the posts aren't unique you'll still need to $pull the original from the list first, eg:

db.user.update(
    // Criteria
    { _id: ObjectId("5163deebe4d809d55d27e846") },

    // Update
    { 
        $pull: {
            posts: { post: ObjectId("5163deebe4d809d55d27e847") }
        }
    }
)

A benefit of this approach is that array manipulation is being done on the server, but as with sorting the array in your application you may still be updating the document more than is required.

like image 141
Stennie Avatar answered Oct 07 '22 17:10

Stennie