how much traffic is heavy traffic? what are the best resources for learning about heavy traffic web site development?.. like what are the approaches?
There are a lot of principles that apply to any web site, irrelevant of the underlying stack:
- use HTTP caching facilities. For one there is the user agent cache. Second, the entire web backbone is full of proxies that can cache your requests, so use this to full advantage. A request that does even land on your server will add 0 to your load, you can't optimize better than that :)
- corollary to the point above, use CDNs (Content Delivery Network, like CloudFront) for your static content. CSS, JPG, JS, static HTML and many more pages can be served from a CDN, thus saving the web server from a HTTP request.
- second corollary to the first point: add expiration caching hints to your dynamic content. Even a short cache lifetime like 10 seconds will save a lot of hits that will be instead served from all the proxies sitting between the client and the server.
- Minimize the number of HTTP requests. Seems basic, but is probably the best overlooked optimization available. In fact, Yahoo best practices puts this as the topmost optimization, see Best Practices for Speeding Up Your Web Site. Here is their bets practices list:
- Minimize HTTP Requests
- Use a Content Delivery Network
- Add an Expires or a Cache-Control Header
- Gzip Components
- ... (the list is quite long actually, just read the link above)
Now after you eliminated as much as possible from the superfluous hits, you still left with optimizing whatever requests actually hit your server. Once your ASP code starts to run, everything will pale in comparison with the database requests:
-
reduce number of DB calls per page. The best optimization possible is, obviously, not to make the request to the DB at all to start with. Some say 4 reads and 1 write per page are the most a high load server should handle, other say one DB call per page, still other say 10 calls per page is OK. The point is that fewer is always better than more, and writes are significantly more costly than reads. Review your UI design, perhaps that hit count in the corner of the page that nobody sees doesn't need to be that accurate...
-
Make sure every single DB request you send to the SQL server is optimized. Look at each and every query plan, make sure you have proper covering indexes in place, make sure you don't do any table scan, review your clustered index design strategy, review all your IO load, storage design, etc etc. Really, there is no short cut you can take her, you have to analyze and optimize the heck out of your database, it will be your chocking point.
-
eliminate contention. Don't have readers wait for writers. For your stack, SNAPSHOT ISOLATION is a must.
-
cache results. And usually this is were the cookie crumbles. Designing a good cache is actually quite hard to pull off. I would recommend you watch the Facebook SOCC keynote: Building Facebook: Performance at Massive Scale. Somewhere at slide 47 they show how a typical internal Facebook API looks like:
.
cache_get (
$ids,
'cache_function',
$cache_params,
'db_function',
$db_params);
Everything is requested from a cache, and if not found, requested from their MySQL back end. You probably won't start with 60000 servers thought :)
On the SQL Server stack the best caching strategy is one based on Query Notifications. You can almost mix it with LINQ...
I will define heavy traffic as traffic which triggers resource intensive work. Meaning, if one web request triggers multiple sql calls, or they all calculate pi with a lot of decimals, then it is heavy.
If you are returning static html, then your bandwidth is more of an issue than what a good server today can handle (more or less).
The principles are the same no matter if you use MVC or not when it comes to optimize for speed.
- Having a decoupled architecture
makes it easier to scale by adding
more servers etc
- Use a repository
pattern for data retrieval (makes
adding a cache easier)
- Cache data
which is expensive to query
- Data to
be written could be written thru a
cache, so that the client don't have
to wait for the actual database
commit
There's probably more ground rules as well. Maybe you can you say something about the architecture of your application, and how much load you need to plan for?