I'm writing an app on Google App Engine to help me learn it better. I'm persisting my data in the Datastore.
The application is models similar to StackOverflow: You have a Story entity, which has a collection of Comment entities, which in turn can be liked/hated by many users. The way I'm modeling this right now is as follows:
class Story {
Comment[] comments;
...
}
class Comment {
User[] likes;
User[] hates;
...
}
So when you load a given story, you can list all the comments, plus the percentage of likes and hates for each comment. You can also keep track of whether or not a given user has voted for a comment or not.
I'm assuming I can lazy load all the actual users in the Comment entity, but even then, I kind of get the idea that there's a better way of doing this.
How would this handle a story with hundreds of comments, each with hundreds of thousands of votes?!
What is a common way of modeling such a concept in NoSQL?
Cloud Bigtable Highly performant, fully managed NoSQL database service for large analytical and operational workloads. Offers up to 99.999% availability. Processes more than 5 billion requests per second at peak, and with more than 10 Exabytes of data under management.
Bigtable is a mutable data NoSQL Database service that is best suited for OLTP use cases. BigQuery is an immutable SQL Data Warehouse that is suitable for OLAP applications like Business Intelligence and Data Analytics.
Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development. Datastore features include: Atomic transactions.
Datastore is a highly scalable NoSQL database for your web and mobile applications.
Possible answers:
(1) How would this handle hundreds of comments?
You seemed to already answer this by suggesting that you lazy load the comments in the UI. I know document databases like Mongo and CouchDB give you the option to page data as it comes out of the database. Things like "limit" and "skip".
Hundreds of comments shouldn't be too hard to store and I wouldn't imagine they'd be slow in a query.
(2) How to handle hundreds of thousands of votes?
I think the best way is to simply pre-process this. When a user votes on something, you might consider doing two operations: 1) Increment the comment's like counter by one. 2) Write a record of the users vote somewhere else.
The first step would be very fast and easy and it would show users the total number of likes immediately.
The second operation (storing what a user did - which comment they liked/disliked) might be a bit slower, but you can easily do it.
It's important to keep in mind that with NoSQL we aren't worried about normalizing the data, so redundant information is ok!
(3) What is the common way of modeling these concepts?
Like I mentioned from (2) - and from my experience - a good way to model is to increment items quickly and to also store redundant information.
It's especially useful to store data many times in various documents because joining in things like Mongo and Couch are very difficult to do. It's best to store that information next to the entity that needs it.
Another quality of NoSQL databases is that they are allowed to be inconsistent. It's ok to have a comment like/dislike count be one number in the comments section and a different number when looking at what the user has liked/disliked.
(The only note about your model that might be scary is splitting entities. Always remember if you split things up - the way you would in a traditional RDMS - you'll have to join them later! That can very tough with NoSQL.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With