I have created such system and I took this approach:
Database table with the following columns: id, userId, type, data, time.
This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.
With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:
{id:1, userId:1, type:PHOTO, time:2008-10-15 12:00:00, data:{photoId:2089, photoName:A trip to the beach}}
You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.
Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.
This is a very good presentation outlining how Etsy.com architected their activity streams. It's the best example I've found on the topic, though it's not rails specific.
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
We've open sourced our approach: https://github.com/tschellenbach/Stream-Framework It's currently the largest open source library aimed at solving this problem.
The same team which built Stream Framework also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients available for Node, Python, Rails and PHP.
In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Though Stream Framework is Python based it wouldn't be too hard to use from a Ruby app. You could simply run it as a service and stick a small http API in front of it. We are considering adding an API to access Feedly from other languages. At the moment you'll have to role your own though.
The biggest issues with event streams are visibility and performance; you need to restrict the events displayed to be only the interesting ones for that particular user, and you need to keep the amount of time it takes to sort through and identify those events manageable. I've built a smallish social network; I found that at small scales, keeping an "events" table in a database works, but that it gets to be a performance problem under moderate load.
With a larger stream of messages and users, it's probably best to go with a messaging system, where events are sent as messages to individual profiles. This means that you can't easily subscribe to people's event streams and see previous events very easily, but you are simply rendering a small group of messages when you need to render the stream for a particular user.
I believe this was Twitter's original design flaw- I remember reading that they were hitting the database to pull in and filter their events. This had everything to do with architecture and nothing to do with Rails, which (unfortunately) gave birth to the "ruby doesn't scale" meme. I recently saw a presentation where the developer used Amazon's Simple Queue Service as their messaging backend for a twitter-like application that would have far higher scaling capabilities- it may be worth looking into SQS as part of your system, if your loads are high enough.
If you are willing to use a separate software I suggest the Graphity server which exactly solves the problem for activity streams (building on top of neo4j graph data base).
The algorithms have been implemented as a standalone REST server so that you can host your own server to deliver activity streams: http://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/
In the paper and benchmark I showed that retrieving news streams depends only linear on the amount of items you want to retrieve without any redundancy you would get from denormalizing the data:
http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/
On the above link you find screencasts and a benchmark of this approach (showing that graphity is able to retrieve more than 10k streams per second).
I started to implement a system like this yesterday, here's where I've got to...
I created a StreamEvent class with the properties Id, ActorId, TypeId, Date, ObjectId and a hashtable of additional Details key/value pairs. This is represented in the database by a StreamEvent table (Id, ActorId, TypeId, Date, ObjectId) and a StreamEventDetails table (StreamEventId, DetailKey, DetailValue).
The ActorId, TypeId and ObjectId allow for a Subject-Verb-Object event to be captured (and later queried). Each action may result in several StreamEvent instances being created.
I've then created a sub-class for of StreamEvent each type of event, e.g. LoginEvent, PictureCommentEvent. Each of these subclasses has more context specific properties such as PictureId, ThumbNail, CommenText, etc (whatever is required for the event) which are actually stored as key/value pairs in the hashtable/StreamEventDetail table.
When pulling these events back from the database I use a factory method (based on the TypeId) to create the correct StreamEvent class.
Each subclass of StreamEvent has a Render(context As StreamContext) method which outputs the event to screen based on the passed StreamContext class. The StreamContext class allows options to be set based on the context of the view. If you look at Facebook for example your news feed on the homepage lists the fullnames (and links to their profile) of everyone involved in each action, whereas looking a friend's feed you only see their first name (but the full names of other actors).
I haven't implemented a aggregate feed (Facebook home) yet but I imagine I'll create a AggregateFeed table which has the fields UserId, StreamEventId which is populated based on some kind of 'Hmmm, you might find this interesting' algorithm.
Any comments would be massively appreciated.
// one entry per actual event events { id, timestamp, type, data } // one entry per event, per feed containing that event events_feeds { event_id, feed_id }
When the event is created, decide which feeds it appears in and add those to events_feeds. To get a feed, select from events_feeds, join in events, order by timestamp. Filtering and aggregation can then be done on the results of that query. With this model, you can change the event properties after creation with no extra work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With