I know variations of this question have been asked many times before (and I've read them, 2 of them being: 1, 2), but I just can't wrap my head around anything that just feels like the right solution.
Everything has been suggested from many to many relations, to fanout, to polymorphic associations, to NoSQL solutions, to message queues, to denormalization and combinations of them all.
I know this question is very situational, so I'll briefly explain mine:
For the mean time, I ended up going with a denormalized setup basically being made up of an events table consisting of: id
, date
, user_id
, action
, root_id
, object_id
, object
, data
.
user_id
being the person that triggered the event.action
being the action.root_id
being the user the object
belongs to.object
being the object type.data
containing the minimum amount of information needed to render the event in a user's stream.
Then to get the desired events, I just grab all rows in which the user_id
is the id of a user being followed by whose stream we're grabbing.
It works, but the denormalization just feels wrong. Polymorphic associations seem similarly so. Fanout seems to be somewhere in between, but feels very messy.
With all my searching on the issue, and reading the numerous questions here on SO, I just can't get anything to click and feel like the right solution.
Any experience, insight, or help anyone can offer is greatly appreciated. Thanks.
You should always start from building a clean and high-performance normalized database. Only if you need your database to perform better at particular tasks (such as reporting) should you opt for denormalization. If you do denormalize, be careful and make sure to document all changes you make to the database.
Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database. Normalizing a database involves removing redundancy so only a single copy exists of each piece of information.
Normalization is the technique of dividing the data into multiple tables to reduce data redundancy and inconsistency and to achieve data integrity. On the other hand, Denormalization is the technique of combining the data into a single table to make data retrieval faster.
I've never dealt with social activity feeds, but based on your description they're quite similar to maintaining tricky business activity logs.
Personally, it's a case I tend to manage with separate tables for applicable activity types, a revisions/logs table for each of these types, and each of the latter with a reference to a more central event logs table.
The latter allows to build the feed and looks a lot like the solution you came up with: event_id, event_at, event_name, event_by, event_summary, event_type. (The event_type field is a varchar containing the name of the table or object.)
You probably don't need to maintain the history of everything in your case (surely this is less appropriate for friends-requests than for sales and stock movements), but maintaining some kind of central event logs table (in addition to other applicable tables to have the normalized data at hand) is, I think, the correct approach.
You might get some interesting insights by looking at audit log related questions:
https://stackoverflow.com/search?q=audit+log
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With