Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb aggregation vs client side processing

I have a blogs collection which has almost the following schema:

{ 
    title: { name: "My First Blog Post",
             postDate: "01-28-11" },
    content: "Here is my super long post ...",
    comments: [ { text: "This post sucks!"
              , name: "seanhess"
              , created: 01-28-14}
            , { text: "I know! I wish it were longer"
              , name: "bob"
              , postDate: 01-28-11} 
            ] 
}

I mainly want to run three queries:

  1. Give me all the comments made by only bob
  2. Find all the comments made at the same day the post is written which is comments.postDate = title.postDate.
  3. Find all the comments made by bob on the same day the post is written

My questions are as following:

  • These three are going to be really frequent queries, so is it a good idea to use aggregation framework?
  • For the third query, I can simply make a query like db.blogs.find({"comments.name":"bob"}, {comments.name:1, comments.postDate:1, title.postDate:1}) and then do a client side post processing to loop through the returned results. Is it a good idea? I'd like to note that it is possible that this might return several thousand documents back.
  • I will be happy if you can propose some ways to make the third query.
like image 217
anvarik Avatar asked Dec 04 '22 07:12

anvarik


1 Answers

It probably is best practice here to "break-up" your multiple questions in to several questions, if not only for that maybe the answer on one question would have led you to understand the other.

I am also not very keen on answering anything where there is no example shown of what yo have tried to do. But with that said and "shooting myself in the foot", the questions are reasonable from a design approach so I will answer.

Point 1 : Comments by "bob"

Standard $unwind and filter the results. Use $match first so you don't process unneeded documents.

db.collection.aggregate([

    // Match to "narrow down" the documents.
    { "$match": { "comments.name": "bob" }},

    // Unwind the array
    { "$unwind": "$comments" },

    // Match and "filter" just the "bob" comments
    { "$match": { "comments.name": "bob" }},

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "comments": { "$push": "$comments" }
    }}
])

Point 2: All comments on the same day

db.collection.aggregate([

    // Try and match posts within a date or range
    // { "$match": { "title.postDate": Date( /* something */ ) }},

    // Unwind the array
    { "$unwind": "$comments" },

    // Aha! Project out the same day. Not the time-stamp.
    { "$project": {
        "title": 1,
        "content": 1,
        "comments": 1,
        "same": { "$eq": [
            {
                "year"   : { "$year":  "$title.postDate" },
                "month"  : { "$month": "$title.postDate" },
                "day": { "$dayOfMonth": "$title.postDate" }
            },
            {
                "year"   : { "$year": "$comments.postDate" },
                "month"  : { "$month": "$comments.postDate" },
                "day": { "$dayOfMonth": "$comments.postDate" }
            }
        ]}
     }},

     // Match the things on the "same 
     { "$match": { "same": true } },     

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "comments": { "$push": "$comments" }
    }}

])

Point 3: "bob" on the same date

db.collection.aggregate([

    // Try and match posts within a date or range
    // { "$match": { "title.postDate": Date( /* something */ ) }},

    // Unwind the array
    { "$unwind": "$comments" },

    // Aha! Project out the same day. Not the time-stamp.
    { "$project": {
        "title": 1,
        "content": 1,
        "comments": 1,
        "same": { "$eq": [
            {
                "year"   : { "$year":  "$title.postDate" },
                "month"  : { "$month": "$title.postDate" },
                "day": { "$dayOfMonth": "$title.postDate" }
            },
            {
                "year"   : { "$year": "$comments.postDate" },
                "month"  : { "$month": "$comments.postDate" },
                "day": { "$dayOfMonth": "$comments.postDate" }
            }
        ]}
     }},

     // Match the things on the "same" field
     { "$match": { "same": true, "comments.name": "bob" } },     

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "comments": { "$push": "$comments" }
    }}

])

Results

Honestly, and especially if you are using some indexing to feed to the initial $match stages of these operations, then it should be very clear that this will "run rings" around trying to iterate this in code.

At the very least this reduces the returned records "over the wire", so there is less network traffic. And of course there is less (or nothing) to post process once the query results have been received.

As a general convention, database server hardware tends to be an order of magnitude higher rated in performance than "application server" hardware. So again the general condition is that anything executed on the server will run faster.

  • Is aggregation the right thing: "Yes". and by a long long way. You even get a cursor very soon.

  • How can you do the queries you want: Shown to be pretty simple. And in real world code we never "hard code" this, we build it dynamically. So adding conditions and attributes should be as simple as all you normal data manipulation code.

So I would not normally answer this style of question. But say thank-you! Please ?

like image 169
Neil Lunn Avatar answered Dec 15 '22 11:12

Neil Lunn