Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Push' vs 'Pull ' when designing social networks (twitter, fb news feed, etc)

We all know about push (fanout on write) vs pull (fanout on read) when designing a feed/twitter system on a social network.

In push mode, we write to the list of updates(posts, tweets, etc) of an author's friends(or followers) each time an author generates a new post, so that their follower don't need to query all of their followees' feed each time.

In pull mode, we let a follower query all of his flowed friends' feed each time he needs to see all of his friends' feed.

But in both cases, what mechanism is commonly used to allow a person to see updated feeds in REAL TIME on the website? ( I would think FB or twitter won't need you to manually refresh the page to see new posts from friends).

Let's say John writes a post, and in push mode, it pushes (writes to SQL or redis cache) this post's pointer to all of his friends' feed, how would one of his friends' browser know that there's now an update from John?

like image 422
user1008636 Avatar asked May 08 '18 15:05

user1008636


People also ask

Does twitter use push or pull?

The notification timeline that people see when visiting their Twitter feed adopts pull, whereas SMS and email are push. For the pull based model, Pathak explained that notifications are usually be served from a cache, as timeline generation is an expensive operation.

How does Facebook news feed work design?

If we describe the system at a higher level, the chain of actions starts when the user adds or updates a post on Facebook. This post is then received by the web server , which then sends it to Facebook application servers . These application servers coordinate with the back-end data of users to generate a newsfeed.

How newsfeed works?

How Does it Work? Put simply, the system determines which posts show up in your News Feed, and in what order, by predicting what you're most likely to be interested in or engage with. These predictions are based on a variety of factors, including what and whom you've followed, liked, or engaged with recently.


2 Answers

I assume you have a dynamic (SPA) front-end.

In pull mode, you have two options:

  • Periodically re-fetch feeds data, each time send last query time to filter for only new feed items. This approach works fine when starting a new project but it won't scale well.

  • Have a message broker where after creating a new post, you need to publish events to all online clients who's feed is potentially updated, later in client side reload feeds after receiving such events. You could also include new contents inside event payload itself.

In push mode:

  • Periodically re-fetch feeds data (since your feed query is not complex, it has much less performance overhead).

  • When you're going to push, check if client has an active connection and publish events in the same time.

Generally people use a hybrid approach:

  • For producers who has a lot of active consumers (logged in at least once in last month) use pull method.

  • For producers who has smaller number of active consumers, use push method.

In push method it's very important to have a capacity on the number of items in a user's feed. If a user requests more feed items, you can then fall back to just pulling. Also since there is capacity, you don't need to push to inactive users (probably will be replaced with new feed items before they log in).

like image 179
Arash Shakery Avatar answered Sep 28 '22 18:09

Arash Shakery


It's a common question in the System Desing interview. Usually asked for Backend Software Engineers applying to FAANG or similar companies.

I found out from Facebook's paper why they preferred the Push model in their TAO Graph database which served for timelines of posts, likes, and so on.

A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer.

This extreme customization makes it infeasible to perform most aggregation and filtering when content is created; instead we resolve data dependencies and check privacy each time the content is viewed. As much as possible we pull the social graph, rather than pushing it.

This implementation strategy places extreme read demands on the graph data store; it must be efficient, highly available, and scale to high query rates.

– paper from 2013, TAO: Facebook’s Distributed Data Store for the Social Graph

There is another approach which used in Twitter – the Push method. I haven't investigated any source why Twitter uses it to prefill personal timelines of tweets.

like image 20
devishot Avatar answered Sep 28 '22 17:09

devishot