How to structure a DynamoDB database to allow queries for trending posts?

Tags:

amazon-dynamodb

I am planning on using the following formula to calculate "trending" posts:

Trending Score = (p - 1) / (t + 2)^1.5

p = votes (points) from users. t = time since submission in hours.

I am looking for advice on how to structure my database tables so that I can query for trending posts with DynamoDB (a nosql database service from Amazon).

DynamoDB requires a Primary Key for each item in a table. The Primary Key can consist of 2 parts: the Hash Attribute (string or number) and the Range Attribute (string or number). The Hash Attribute must be unique for each item and is required. The Range Attribute is optional, but if used DynamoDB will build a sorted range index on the Range Attribute.

The structure I had in mind goes as follows:

TableName: Users

Click to copy

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

TableName: Posts

Click to copy

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

TableName: Categories

Click to copy

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

TableName: Counters

Click to copy

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

So here is an example of the types of requests I would make with the following table setup (example: user_id=100):

User Action 1:

User creates a new post and tags the post for 2 categories (baseball,soccer)

Query (1):

Check current value for the counter_name='post_id' and increment+1 and use the new post_id

Query (2): Insert the following into the Posts table:

Click to copy

post_id=value_from_query_1, user_id=100, title=user_generated, content=user_generated, points=0, categories=['baseball','soccer']

Query (3):

Insert the following into the Categories table:

Click to copy

category_name='baseball', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

Query (4):

Insert the following into the Categories table:

Click to copy

category_name='soccer', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

The end goal is to be able to conduct the following types of queries:

1. Query for trending posts

2. Query for posts in a certain category

3. Query for posts with the highest point values

Does anyone have any idea how I could structure my tables so that I could do a query for trending posts? Or is this something I give the up the ability to do by switching to DynamoDB?

730

asked Feb 18 '12 05:02

Jason Pudzianowski

1 Answers

I'm starting with a note on your comment with the timestamp vs post_id.
Since you are going to use DynamoDB as your post_id generator, there is a scalability issue right there. Those numbers are inherently unscalable and you better off using a date object. If you need to create posts in a crazy speed time you can start reading about how twitter are doing it http://blog.twitter.com/2010/announcing-snowflake

Now let's get back to your trending check:
I believe your scenario is misusing DynamoDB.
Let's say you have one HOT category that has most posts in it. Basically you will have to scan the whole posts (since the data isn't spread well) and for each start to look at the points and do the comparisons in your server. This will just not work or will be very expensive since each time you will probably use all your reserved read units capacity.

The DynamoDB approach for those type of trends checking is using MapReduce
Read here how to implement those: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html

I can't specify a time, but I believe you will find this approach scalable - though you won't be able to use it often.

On another note - you could keep a list of the "top 10/100" trendy questions and you update them in "real-time" when a post is upvoted - you get the list, check if it needs to be updated with the newly upvoted question and save it back to the db if needed.

answered Sep 20 '22 14:09

Chen Harel

Related questions
                            
                                Single table db architecture with AWS Amplify
                            
                                It is possible to manage users/identities in a data store that exhibits eventual consistency?
                            
                                RavenDB - LINQ - Count() discrepancies
                            
                                How to store rich text data in MongoDB and CouchDB from Node.js
                            
                                Overhead and (in)efficiency of NoSQL databases?
                            
                                Which NoSQL database best for append only audit logging use case?
                            
                                Google Apps Script Apps Suddenly Performing Slowly/Erroring
                            
                                Long db query result with concurrent write at the same time
                            
                                How to handle a change in denormalized data
                            
                                Mongo - What does WriteConcern j option do when journaling is turned off?
                            
                                CQRS: Update Read-Model without Event Sourcing
                            
                                Why well-designed DynamoDB application require only one table?
                            
                                What is the most efficient way to query two collections in MongoDB for search results with pagination
                            
                                Securing document-style databases (MongoDb, CouchDb, RavenDb) for client (browser) access
                            
                                Datamodel for Property Graph over HBase/Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to structure a DynamoDB database to allow queries for trending posts?

Tags:

nosql

amazon-dynamodb

Jason Pudzianowski

People also ask

1 Answers

Chen Harel

Recent Activity

Donate For Us