We all know about push (fanout on write) vs pull (fanout on read) when designing a feed/twitter system on a social network. In push mode, we write to the list of updates(posts, tweets, etc) of an author's friends(or followers) each time an author generates a new post, so that their follower don't need to query all of their followees' feed each time. In pull mode, we let a follower query all of his flowed friends' feed each time he needs to see all of his friends' feed. But in both cases, what mechanism is commonly used to allow a person to see updated feeds in REAL TIME on the website? ( I would think FB or twitter won't need you to manually refresh the page to see new posts from friends). Let's say John writes a post, and in push mode, it pushes (writes to SQL or redis cache) this post's pointer to all of his friends' feed, how would one of his friends' browser know that there's now an update from John?

I assume you have a dynamic (SPA) front-end. In pull mode, you have two options: <ul> <li>Periodically re-fetch feeds data, each time send last query time to filter for only new feed items. This approach works fine when starting a new project but it won't scale well.</li> <li>Have a message broker where after creating a new post, you need to publish events to all online clients who's feed is potentially updated, later in client side reload feeds after receiving such events. You could also include new contents inside event payload itself.</li> </ul> In push mode: <ul> <li>Periodically re-fetch feeds data (since your feed query is not complex, it has much less performance overhead).</li> <li>When you're going to push, check if client has an active connection and publish events in the same time.</li> </ul> Generally people use a hybrid approach: <ul> <li>For producers who has a lot of active consumers (logged in at least once in last month) use pull method.</li> <li>For producers who has smaller number of active consumers, use push method.</li> </ul> In push method it's very important to have a capacity on the number of items in a user's feed. If a user requests more feed items, you can then fall back to just pulling. Also since there is capacity, you don't need to push to inactive users (probably will be replaced with new feed items before they log in).

'Push' vs 'Pull ' when designing social networks (twitter, fb news feed, etc)

Tags:

architecture

system

We all know about push (fanout on write) vs pull (fanout on read) when designing a feed/twitter system on a social network.

In push mode, we write to the list of updates(posts, tweets, etc) of an author's friends(or followers) each time an author generates a new post, so that their follower don't need to query all of their followees' feed each time.

In pull mode, we let a follower query all of his flowed friends' feed each time he needs to see all of his friends' feed.

But in both cases, what mechanism is commonly used to allow a person to see updated feeds in REAL TIME on the website? ( I would think FB or twitter won't need you to manually refresh the page to see new posts from friends).

Let's say John writes a post, and in push mode, it pushes (writes to SQL or redis cache) this post's pointer to all of his friends' feed, how would one of his friends' browser know that there's now an update from John?

422

asked May 08 '18 15:05

user1008636

2 Answers

I assume you have a dynamic (SPA) front-end.

In pull mode, you have two options:

Periodically re-fetch feeds data, each time send last query time to filter for only new feed items. This approach works fine when starting a new project but it won't scale well.
Have a message broker where after creating a new post, you need to publish events to all online clients who's feed is potentially updated, later in client side reload feeds after receiving such events. You could also include new contents inside event payload itself.

In push mode:

Periodically re-fetch feeds data (since your feed query is not complex, it has much less performance overhead).
When you're going to push, check if client has an active connection and publish events in the same time.

Generally people use a hybrid approach:

For producers who has a lot of active consumers (logged in at least once in last month) use pull method.
For producers who has smaller number of active consumers, use push method.

In push method it's very important to have a capacity on the number of items in a user's feed. If a user requests more feed items, you can then fall back to just pulling. Also since there is capacity, you don't need to push to inactive users (probably will be replaced with new feed items before they log in).

179

answered Sep 28 '22 18:09

Arash Shakery

It's a common question in the System Desing interview. Usually asked for Backend Software Engineers applying to FAANG or similar companies.

I found out from Facebook's paper why they preferred the Push model in their TAO Graph database which served for timelines of posts, likes, and so on.

A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer.

This extreme customization makes it infeasible to perform most aggregation and filtering when content is created; instead we resolve data dependencies and check privacy each time the content is viewed. As much as possible we pull the social graph, rather than pushing it.

This implementation strategy places extreme read demands on the graph data store; it must be efficient, highly available, and scale to high query rates.

– paper from 2013, TAO: Facebook’s Distributed Data Store for the Social Graph

There is another approach which used in Twitter – the Push method. I haven't investigated any source why Twitter uses it to prefill personal timelines of tweets.

answered Sep 28 '22 17:09

devishot

Related questions
                            
                                Unsure of how to approach Data Access Object/Layer in an Express / MongoDB Application
                            
                                How do you architect an application like Firebase?
                            
                                How to design a high-level application protocol and data format for metadata syncing between devices and server?
                            
                                What's the difference between a Controller and a Service?
                            
                                WPF: Calling method in View from viewModel
                            
                                physical memory on AMD devices: local vs private
                            
                                Data and Form validation in MVC architecture
                            
                                Which considerations should I make before splitting an application into several solutions in Visual Studio?
                            
                                The right way to implement associations in DDD?
                            
                                Will this be a valid base class for IDisposable
                            
                                PHPUnit: Multiple Bootstraps or XML files?
                            
                                how to decouple data from business logic
                            
                                A good article on modern CPU features/performance optimizations?
                            
                                Is there any reason for an object pool to not be treated as a singleton?
                            
                                What sample application demonstrates best practices for MVC structure in a Google App Engine/Python app?
                            
                                Using OTP/Erlang as a part of the component-based architecture of a web application
                            
                                Managing Dynamic Website Settings Persisted in a Database
                            
                                Ember.js: how to model this example?
                            
                                Redshift as a Web App Backend?
                            
                                Architecture: Combine several JavaScript projects (modular) [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With