I have a movie database where I need to populate with data so it becomes easier to test and develop the application. There's tables to hold movie ratings and user accounts, the users rate the movies.
I've started to develop a script to populate the database with fake and generic data but I don't know how to randomize the rating. For each movie I select a random number of users, 100, 500, 1000, whatever. And for each of those users I randomize a rating from 1 through 10. But these ratings are resulting in the same average, around 5. Which means the distribution of ratings (1 through 10) for a specific movie is basically the same. This is not "realistic" at all as all movies with ratings generated like this will have the same average, thus the same ratings from different users and different amount of users, doesn't really matter.
I wanted movie A to have an average of 7, movie B average of 5, movie C average of 8, etc... But I just don't want the average to be different for every movie. I mean, it would be nice to produce ratings like this (for a specific number of users): http://www.imdb.com/title/tt1046173/ratings or this http://www.imdb.com/title/tt0486640/ratings
You know, something random that could produce two different variations like those above. I hit refresh and I get the first graph, I hit refresh and get the second, hit again and get something different or similar, something "random" and "realistic".
I'm also going to display graphs like this on my app so it would look nice to have different distributions. But I have no idea how can I randomly accomplish this with a simple script to generate all that.
How can I solve this? Maybe it's too much work not worth it?
Maybe something simpler, like select a point (between 1 and 10) and then create a normal distribution of ratings where that selected point is the highest one, that would work for me.
You want to fix the mean, and probably the variance, and generate random numbers around those.
This should help you get started: Generating random numbers with known mean and variance
Edit: Actually, if you think about it this can be solved easily: the reason your numbers are tending towards 5, is because your scale is between 1 and 10 (so the mean is 5).
Just take your random numbers, add 8 to all of them, and round any number greater than 10 down to 10, and you'll get something centered around 8-ish (but skewed above). Probably good enough for your purposes?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With