I have a blogs
collection which has almost the following schema:
{
title: { name: "My First Blog Post",
postDate: "01-28-11" },
content: "Here is my super long post ...",
comments: [ { text: "This post sucks!"
, name: "seanhess"
, created: 01-28-14}
, { text: "I know! I wish it were longer"
, name: "bob"
, postDate: 01-28-11}
]
}
I mainly want to run three queries:
comments
made by only bob
comments
made at the same day the post is written which is comments.postDate = title.postDate
.comments
made by bob
on the same day the post is writtenMy questions are as following:
db.blogs.find({"comments.name":"bob"}, {comments.name:1, comments.postDate:1, title.postDate:1})
and then do a client side post processing to loop through the returned results. Is it a good idea? I'd like to note that it is possible that this might return several thousand documents back.It probably is best practice here to "break-up" your multiple questions in to several questions, if not only for that maybe the answer on one question would have led you to understand the other.
I am also not very keen on answering anything where there is no example shown of what yo have tried to do. But with that said and "shooting myself in the foot", the questions are reasonable from a design approach so I will answer.
Standard $unwind and filter the results. Use $match first so you don't process unneeded documents.
db.collection.aggregate([
// Match to "narrow down" the documents.
{ "$match": { "comments.name": "bob" }},
// Unwind the array
{ "$unwind": "$comments" },
// Match and "filter" just the "bob" comments
{ "$match": { "comments.name": "bob" }},
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
db.collection.aggregate([
// Try and match posts within a date or range
// { "$match": { "title.postDate": Date( /* something */ ) }},
// Unwind the array
{ "$unwind": "$comments" },
// Aha! Project out the same day. Not the time-stamp.
{ "$project": {
"title": 1,
"content": 1,
"comments": 1,
"same": { "$eq": [
{
"year" : { "$year": "$title.postDate" },
"month" : { "$month": "$title.postDate" },
"day": { "$dayOfMonth": "$title.postDate" }
},
{
"year" : { "$year": "$comments.postDate" },
"month" : { "$month": "$comments.postDate" },
"day": { "$dayOfMonth": "$comments.postDate" }
}
]}
}},
// Match the things on the "same
{ "$match": { "same": true } },
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
db.collection.aggregate([
// Try and match posts within a date or range
// { "$match": { "title.postDate": Date( /* something */ ) }},
// Unwind the array
{ "$unwind": "$comments" },
// Aha! Project out the same day. Not the time-stamp.
{ "$project": {
"title": 1,
"content": 1,
"comments": 1,
"same": { "$eq": [
{
"year" : { "$year": "$title.postDate" },
"month" : { "$month": "$title.postDate" },
"day": { "$dayOfMonth": "$title.postDate" }
},
{
"year" : { "$year": "$comments.postDate" },
"month" : { "$month": "$comments.postDate" },
"day": { "$dayOfMonth": "$comments.postDate" }
}
]}
}},
// Match the things on the "same" field
{ "$match": { "same": true, "comments.name": "bob" } },
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
Honestly, and especially if you are using some indexing to feed to the initial $match stages of these operations, then it should be very clear that this will "run rings" around trying to iterate this in code.
At the very least this reduces the returned records "over the wire", so there is less network traffic. And of course there is less (or nothing) to post process once the query results have been received.
As a general convention, database server hardware tends to be an order of magnitude higher rated in performance than "application server" hardware. So again the general condition is that anything executed on the server will run faster.
Is aggregation the right thing: "Yes". and by a long long way. You even get a cursor very soon.
How can you do the queries you want: Shown to be pretty simple. And in real world code we never "hard code" this, we build it dynamically. So adding conditions and attributes should be as simple as all you normal data manipulation code.
So I would not normally answer this style of question. But say thank-you! Please ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With