Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Post processing of reverse proxied HTTP requests? (like Akamai's ESI)

We run a relatively high volume content site. Like most content sites, the majority of each page is relatively static. The articles rarely change, making them good candidates for some form of static/edge caching. There are two big problems, though. Secondary page elements (nav, recent content lists, etc) change pretty frequently, quickly invalidating "full" cached pages. It's also quite common that we include more dynamic bits in a page, like user specific information, etc.

It would be really neat to have a reverse-proxy/load balancer that post-processed content and let us handle includes at the proxy/edge. The initial request to the backend would return a rough template, then the proxy software could process that template to complete it. The markup might look something like this:

<html>
<body>
  <div id="content">
    Lorem ipsum whackem smackem.
    <%
      dynamic "http://related.content.service/this/story"
    %>
  </div>
  <div id="sidebar">
    <%
      dynamic do |request|
        url = "http://my.user.service/user-widget.html"
        if request.cookies.contains?("user_token")
          url = "http://my.user.service/" + request.cookies["user_token"] + "/user-widget.html"
        end

        error_text = "User service not available"
        { :url => url, :timeout => 500, :error => error_text }
      end
    %>
  </div>
</body>
</html>

What you'll see in that example is a small bit of Ruby that determines the included file based on a cookie value, then returns a hash with the URL to pull in, a timeout, and some default text to show in the event of an error. In theory, all the includes could be requested asynchronously as well.

My understanding is that Amazon does something like this. Various page components are generated by backend services, with strict timeout limits to ensure overall page speed. I was hoping their CDN service would include something like this, but it's not to be!

There's a W3 spec for Edge Side Includes (ESI) is almost what I want. There's very little support for it out there, however. It's available through Akamai, there's some Oracle software that does it, and the open source Varnish cache has a very basic implementation. It's also a really ugly XML format.

So the question is: what out there will let me do what I want? Is anyone else doing things in this way?

like image 512
MrKurt Avatar asked Nov 30 '08 01:11

MrKurt


4 Answers

set Nginx as a front-end, and use SSI to pick the dynamic parts of the pages. dynamic source can be an HTTP server, like Apache, or a FastCGI server, for example PHP, or Django.

edit:

Many webservers support some form of SSI (Server Side Includes), this feature lets you add some tags into the HTML as a very limited form of scripting, much simpler and faster (and older) than PHP. Using this you can set static pages with most of the content, and for the 'small dynamic parts', an SSI tag references a dynamic page generated somewhere else.

I particularly like nginx as a frontend to almost anything. it's wicked fast, light on resources and hugely scalable (think lighthttp with cleaner and stabler code). the author describes it not as a general-purpose webserver; but as a proxy frontend. The backends can be an HTTP server (usually Apache) or FastCGI processes (PHP, Python, Perl, whatever), or a farm of either, or both.

the memcached module is amazing, it uses memcached (which is the fastest and most scalable general-purpose distributed hashtable around) to directly relate a webpage with an URL, no disk access involved. since memcached is accessible from 'outside' the webserver itself, it can be used even with dynamic pages (given a sane URL/resource mapping); but I don't think it would help a lot in your case. in any case, first make it work with SSI, then you can (if necessary) optimise the dynamic part with memcached.

like image 137
Javier Avatar answered Nov 20 '22 02:11

Javier


So it turns out that Varnish has (and had) basic ESI support that does nearly everything I wanted it to. If anyone needs to do some ESI stuff, Varnish seems to work pretty well for it. It's pretty basic, but still awesome.

like image 26
MrKurt Avatar answered Nov 20 '22 02:11

MrKurt


I know a few people have written about using nginx SSI with the memcache nginx module to splice together content fragments. It's a lot more limited than something like ESI, but still useful.

like image 39
Jason Watkins Avatar answered Nov 20 '22 00:11

Jason Watkins


Akamai has a solution for Edge Computing which allows for J2EE to be run at the Edge. Other alternatives today include any Cloud Computing service - Rackspace and Amazon are a couple players in this market. Ideally you would use a combination of CDN and Cloud Computing to get the desired result. Also, you could opt to have the dynamic content get served asynchronously via a web service after the page template loads and then just cache the static page content with the html template.

like image 1
Hunter Avatar answered Nov 20 '22 01:11

Hunter