Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Event feed implementation - will it scale?

Situation:

I am currently designing a feed system for a social website whereby each user has a feed of their friends' activities. I have two possible methods how to generate the feeds and I would like to ask which is best in terms of ability to scale.

Events from all users are collected in one central database table, event_log. Users are paired as friends in the table friends. The RDBMS we are using is MySQL.

Standard method: When a user requests their feed page, the system generates the feed by inner joining event_log with friends. The result is then cached and set to timeout after 5 minutes. Scaling is achieved by varying this timeout.

Hypothesised method: A task runs in the background and for each new, unprocessed item in event_log, it creates entries in the database table user_feed pairing that event with all of the users who are friends with the user who initiated the event. One table row pairs one event with one user.

The problems with the standard method are well known – what if a lot of people's caches expire at the same time? The solution also does not scale well – the brief is for feeds to update as close to real-time as possible

The hypothesised solution in my eyes seems much better; all processing is done offline so no user waits for a page to generate and there are no joins so database tables can be sharded across physical machines. However, if a user has 100,000 friends and creates 20 events in one session, then that results in inserting 2,000,000 rows into the database.

Question:

The question boils down to two points:

  • Is this worst-case scenario mentioned above problematic, i.e. does table size have an impact on MySQL performance and are there any issues with this mass inserting of data for each event?
  • Is there anything else I have missed?
like image 534
SlappyTheFish Avatar asked Nov 15 '22 10:11

SlappyTheFish


1 Answers

I think your hypothesised system generates too much data; firstly on the global scale the storage and indexing requirements on user_feed seems to escalate exponentially as your user-base becomes larger and more interconnected (both presumably desirable for a social network); secondly consider if in the course of a minute 1000 users each entered a new message and each had 100 friends - then your background thread has 100 000 inserts to do and might quickly fall behind.

I wonder if a compromise might be made between your two proposed solutions where a background thread updates a table last_user_feed_update which contains a single row for each user and a timestamp for the last time that users feed was changed.

Then although the full join and query would be required to refresh the feed, a quick query to the last_user_feed table will tell if a refresh is required or not. This seems to mitigate the biggest problems with your standard method as well as avoid the storage size difficulties but that background thread still has a lot of work to do.

like image 161
Elemental Avatar answered Dec 19 '22 15:12

Elemental