I have chosen DynamoDB as the backend for my activity feed/events data but am having some trouble deciding on the best data structure to use.
Firstly I should explain that activity ID's for each user are stored in Redis sorted sets (for personal profile activities) and in Redis lists for an individuals activity stream, meaning that any activity tables I have in DymaoDB will only need a hash key and have no need for range, local or global indexes since they are essentially being indexed in Redis.
We are doing this so that we can effectively aggregate feed and profile activity data by manipulating the ID lists and sets in Redis.
Anyway... Our initial plan was to create a DynamoDB table for each month, store the activity data there... then dial down the provisioned throughput for older tables as they age, keeping the most recent data fast and available while keeping the cost down for old data.
While this technique works very well for the activity stream itself, it does not work when viewing a users profile (and their own historic activities) Since, in a manner similar to Facebooks timeline, users are able to view all they we back to their birth and are able to add custom life events into their profile. This requirement would mean having a table for each month of the last 80 years or so, therefore, we need something else.
Currently we are toying with the idea of splitting the activity tables into activity types. e.g:
activities_comments
actvities_likes
actiities_uploads
activities_posts
... And so on.
We would need around 20 tables to cover all our current activity types. Using this method would allow us to selectively provision throughput for the most commonly occurring activities and to us seems preferable to keeping a single activity table with a huge and expensive provisioned throughput.
In redis, we would simply add a table suffix to each activity id to allow us to know which table the activity metadata is stored in then we would be able to query the data as follows:
For activity streams:
For users profiles
The aggregating of data will be done offline, where we will analyze the redis lists/sorted sets for similar activities occurring in a given time period, then create a new activity with the aggregated metadata, add it to dynamoDB, add the new activity to Redis at the correct place, and finally remove all old relating activities from Redis lists/sets.
e.g.
The above is actually substantially more complicated and takes into account most most popular post and activity weighting which we have developed... but it gives you a rough idea.
So, now that I've described the solution that we are currently thinking of going with, what I would like to know is:
I know this is kind of a vague question and that there's a lot to read, but any opinions or comments would be greatly appreciated.
NOTE: For the sake of completeness I should state that activity ID's are pushed out on write into a users followers activity streams in Redis. Though we are not adverse to switching to fan out on read, should someone convince us of its benefits in their answer.
Building activity feeds and newsfeeds on DynamoDB requires a lot of additional infrastructure due to how you propagate data (fanout on write) which usually results in a lot of provisioning drama and high costs.
I wrote an article describing the challenges with running newsfeeds on DynamoDB here.
Disclaimer: I am the CTO and one the co-founders of Stream
You could enable DynamoDB Streams on your activity tables and attach Lambda functions to them to incrementally aggregate activities in your Redis structures. Using time-series tables is a recommended practice for managing the cost of provisioning throughput on hot/cold data. However, there are practical limitations, like the per-account-per-region limit of 256 tables that may limit your ability to keep all of the data in DynamoDB. The same Lambda function could maintain caches of activity counts with a sliding window that you could use for aggregating many small activities into aggregate activities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With