Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does StackOverflow optimise the performance for the display of the questions?

I am trying to learn C#.net to program a web app.

And having learned that stackoverflow uses C#.net I am happy to discover it.

I noticed that at the home page or at the questions section, whenever I refresh the page. The page always returns me the latest information without fail and at acceptable speeds.

I am not sure how do you do it. Sorry for the long series of questions. I am trying to learn what is the best practices for data retrieval, paging, performance , etc

I know that the homepage only returns a limited number of questions and their stats but the questions section actually returns everything.

How do you optimise it?

  1. For the homepage, do you always grab ALL the stats of the recent questions? so your query is something like "select * from questions order by datetime_created limit 20" ?

    So the * contains ALL the info including question title, id, views, etc?

    Do you use HttpContext.Current.Server.cache to help with that?

  2. For the questions, this is even more intriguing.

    How do you do the paging?

    Do you always grab from the database only the results for the specific page?

    Or do you grab all the results and store it into a dataset? Then you use some kind of datagrid control to help with the paging?

If it is the latter, how do you maintain the data to be updated?

like image 839
Kim Stacks Avatar asked Mar 24 '09 06:03

Kim Stacks


1 Answers

Here on Stack Overflow, we try to use aggressive caching on many levels:

  • pages entirely cached by IIS' output cache, regardless of user authentication
  • pages cached only for anonymous users; registered users see the most recent content
  • portions of pages' html cached for everyone; HttpRuntime.Cache is used for this

The home page is made up of three cached html pieces - recent questions, recent tags, recent badges - each with a different duration.

A questions list page will cache the ids (Int32[]) of all questions for a particular sort/tag filter, making paging trivial. Further caching on the stats (e.g. question count, related tag counts) is done, as well.

A question detail page will be entirely cached for anonymous users, while registered users see the latest goods. Also, the related questions on the side are cached to disk for a longer duration.

While we try to cache entire pages wherever possible, we do show user information at the page top - some parts just cannot be cached.

So look at caching like a puzzle - what parts can be safely shared between all my requests? Based on expense, what parts MUST be shared across all my requests?

like image 190
Jarrod Dixon Avatar answered Sep 17 '22 23:09

Jarrod Dixon