Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schema for User Ratings - Key/Value DB

We're using MongoDB and I'm figuring out a schema for storing Ratings.

  • Ratings will have values of 1-5.
  • I want to store other values such as fromUser

This is fine but the main question I have is setting it up so that recalculating the average is as efficient as possible.


SOLUTION 1 - Separate Ratings Class

The first thought was to create a separate Ratings class and store an array of pointers to Ratings in the User class. The reason I second guessed this is that we will have to query for all of the Ratings objects every time a new Rating comes in so that we can recalculate an average

...

SOLUTION 2 - Dictionary in User Class

The second thought was to store a dictionary in the User class directly that would store these Ratings objects. This would be slightly more lightweight than Solution 1, but we'd be re-writing the entire Ratings history of each user every time we update. This seems dangerous.

...

SOLUTION 3 - Separate Ratings Class with Separate Averages in User Class

Hybrid option where we have Ratings in their own class, and a pointer array to them, however, we keep two values in the User Class - ratingsAve and ratingsCount. This way when a new Rating is set we save that object but we can recalculate the ratingsAve easily.


SOLUTION 3 sounds best to me but I'm just wondering if we'd need to include periodic calibrations by requerying the Ratings history to reset the ratingsAve just to make sure everything checks out.

I might be overthinking this but I'm not that great at DB schema creation, and this seems like a standard schema issue that I should know how to implement.

Which is the best option to ensure consistency but also efficiency of recalculation?

like image 288
OdieO Avatar asked Nov 13 '14 17:11

OdieO


People also ask

What is a mongoose schema?

A Mongoose schema defines the structure of the document, default values, validators, etc., whereas a Mongoose model provides an interface to the database for creating, querying, updating, deleting records, etc.

Is MongoDB a key-value db?

MongoDB as a key-value store The ability of MongoDB to efficiently store flexible schema documents and perform an index on any of the additional fields for random seeks makes it a compelling key-value store.

How does MongoDB store key-value pairs?

Every key-value pair is stored in a bucket, which is really just a MongoDB collection (the "bucket" terminology is used merely for resemblance with other key-value stores), so the same key can exist, with possibly different values, in multiple buckets.

What is the aggregation operator name for a join concept in MongoDB?

The $expr operator allows the use of aggregation expressions inside of the $match syntax. Starting in MongoDB 5.0, the $eq , $lt , $lte , $gt , and $gte comparison operators placed in an $expr operator can use an index on the from collection referenced in a $lookup stage.


2 Answers

First of all 'Dictionary in User Class' is not a good idea. why? Adding extra rate object requires pushing a new item to the array, which implies the old item will be removed, and this insertion is so called "moving a document". Moving documents is slow and MongoDB is not so great at reusing empty space, so moving documents around a lot can result in large swaths of empty data file (some text in 'MongoDB The Definitive Guide' book).

Then what is the correct solution: assume you have a collection named Blogs, and want to implement a rating solution for your blog posts, and additionally keep track of every user-based rate operation.

The schema for a blog document would be like:

{
   _id : ....,
   title: ....,
   ....
   rateCount : 0,
   rateValue : 0,
   rateAverage: 0
}

You need another collection (Rates) with this document schema:

{
    _id: ....,
    userId: ....,
    postId:....,
    value: ..., //1 to 5
    date:....   
}

And you need to define a proper index for it:

db.Rates.ensureIndex({userId : 1, postId : 1})// very useful. it will result in a much faster search operation in case you want to check if a user has rated the post previously

When a user wants to rate, firstly you need to check whether the user has rated the post or not. assume the user is 'user1', the query then would be

var ratedBefore = db.Rates.find({userId : 'user1', postId : 'post1'}).count()

And based on ratedBefore, if !ratedBefore then insert new rate-document to Rates collection and update blog status, otherwise, user is not allowed to rate

if(!ratedBefore)
{
    var postId = 'post1'; // this id sould be passed before by client driver
    var userId = 'user1'; // this id sould be passed before by client driver
    var rateValue = 1; // to 5
    var rate = 
    {       
       userId: userId,
       postId: postId,
       value: rateValue,
       date:new Date()  
    };

    db.Rates.insert(rate);
    db.Blog.update({"_id" : postId}, {$inc : {'rateCount' : 1, 'rateValue' : rateValue}});
}

Then what is gonna happen to rateAverage? I strongly recommend to calculate it based on rateCount and rateValue on client side, it is easy to update rateAverage with mongoquery, but you shouldn't do it. why? The simple answer is: this is a very easy job for client to handle these kind of works and putting average on every blog document needs an unnecessary update operation.

the average query would be calculated as:

var blog = db.Blog.findOne({"_id" : "post1"});
var avg = blog.rateValue / blog.rateCount;
print(avg);

With this approach you will get maximum performance with mongodb an you have track of every rate based by user, post and date.

like image 164
2 revs, 2 users 67% Avatar answered Oct 05 '22 18:10

2 revs, 2 users 67%


My solution is quite simple, similar to your 3rd option but more simpler. Let's said we have 3 models: Book, User and Rating. I added new field call totalRated - array of int to Book model to store total Rated counting, the value is mapping index + 1.

Your rating system from 1-5, so, totalRated means:

  • [total1star, total2star, total3star, total4star, total5star]

Every time user rate this Book, I will create a Document on Rating collection, and increase the counting by 1 (mapping with the index+1 of totalRated array).

The Results is:

  • rateCount now is sum of each item in array.
  • rateAverage should be (index+1 * value) / rateCount.
  • We can get total number rate by value mapping with index + 1 too.

Step by step

For default, this should be:

// Book Document
{
 _id,
 totalRated: [0, 0, 0, 0, 0],
 ...otherFields
}
  • If user1 rate 5 star for this book, the document now should be:
{
 _id,
 totalRated: [0, 0, 0, 0, 1],
 ...otherFields
}
  • If user2 rate 4 star for this book, the document now should be:
{
 _id,
 totalRated: [0, 0, 0, 1, 1],
 ...otherFields
}
  • If user3 rate 4 star for this book, the document now should be:
{
 _id,
 totalRated: [0, 0, 0, 2, 1],
 ...otherFields
}
  • rateCount = 0 + 0 + 0 + 2 + 1 = 3
  • rateAverage = (0*1 + 0*2 + 0*3 + 2*4 + 1*5)/3 = 9.6666...

Note: You can change array int to array object, the key should be rating value, and value should be totalRating, but array int is enough for me.

like image 45
Huy Nguyen Avatar answered Oct 05 '22 16:10

Huy Nguyen