Consider I have a website where I've got a bunch of articles and people can vote on the articles they like.
I want to be able to query to get the articles with the most votes within a certain time (last hour, last day, last week) ordered by the number of votes.
As usual with MongoDB there are several different ways to implement this, but I am not sure which one is correct.
{
"_id": "ObjectId(xxxx)",
"title": "Post Title",
"postdate": "21/02/2012+1345",
"summary": "Summary of Article",
"Votes": [
{
"userid":ObjectId(xxxx),
"username": "Joe Smith",
"votedate": "03/03/2012+1436"
},
]
}
{
"_id": "ObjectId(xxxx)",
"postId": ObjectId(xxxx),
"userId": ObjectId(xxxx),
"votedate": "03/03/2012+1436"
}
The first one is more Documentey but I have no idea how to query the votes array to get the documents with the most votes in the last 24 hours.
I'm leaning towards the second one as it would be easier to query the vote count grouped by vote I think, but I'm not sure how well it would perform. This is how you'd do it in Relational Databases, but it doesn't seem very documenty - but I'm not sure if its a problem, is it?
Or do I use a combination of the two? Also would I do this type of aggregate query in real-time, every page load. Or do I just run the query say once per minute and store the results in a query result collection?
How would you implement this schema?
The common way to track counts for votes overall would be to keep the number of votes in the post document and to update it atomically when pushing a new value to the votes array.
Since it's a single update, you are guaranteed that the count will match the number of elements in the array.
If the number of aggregations is fixed and the site is very busy you could extend this paradigm and increment additional counters, like one for month, day and hour, but that could get out of hand very quickly. So instead you could use the new Aggregation Framework (available in 2.1.2 dev release, will be in production in release 2.2. It is simpler to use than Map/Reduce and it will allow you to do the calculations you want very simply especially if you take care to store your vote dates as ISODate() type.
Typical pipeline for aggregation query for top vote getters this month might look something like this:
today = new Date();
thisMonth = new Date(today.getFullYear(),today.getMonth());
thisMonthEnd = new Date(today.getFullYear(),today.getMonth()+1);
db.posts.aggregate( [
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$unwind: "$Votes" },
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$group: { _id: "$title", votes: {$sum:1} } },
{$sort: {"votes": -1} },
{$limit: 10}
] );
This limits the input to the pipeline to posts that have votes by matching vote dates to the month you are counting, "unwinds" the array to get one document per vote and then does a "group by" equivalent summing up all votes for each title (I'm assuming title is unique). It then sorts descending by number of votes and limits the output to first ten.
You also have the ability to aggregate votes by day (for example) for that month to see which days are most active for voting:
db.posts.aggregate( [
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$unwind: "$Votes" },
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$project: { "day" : { "$dayOfMonth" : "$Votes.votedate" } } },
{$group: { _id: "$day", votes: {$sum:1} } },
{$sort: {"votes": -1} },
{$limit: 10}
] );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With