Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What experience do you have using nginx and memcached to optimize a web site? [closed]

We have a Java EE-based web application running on a Glassfish app server cluster. The incoming traffic will mainly be RESTful requests for XML-based representations of our application resources, but perhaps 5% of the traffic might be for JSON- or XHTML/CSS-based representations.

We're now investigating load-balancing solutions to distribute incoming traffic across the Glassfish instances in the cluster. We're also looking into how to offload the cluster using memcached, an in-memory distributed hash map whose keys would be the REST resource names (eg, "/user/bob", "/group/jazzlovers") and whose values are the corresponding XML representations.

One approach that sounds promising is to kill both birds with one stone and use the lightweight, fast nginx HTTP server/reverse proxy. Nginx would handle each incoming request by first looking its URI up in memcached to see if there's an unexpired XML representation already there. If not, nginx sends the request on to one of the Glassfish instances. The nginx memcached module is described in this short writeup.

What is your overall impression with nginx and memcached used this way, how happy are you with them? What resources did you find most helpful for learning about them? If you tried them and they didn't suit your purposes, why not, and what did you use instead?

Note: here's a related question.

Update: I later asked the same question on ServerFault.com. The answers there are mainly suggesting alternatives to nginx (helpful, but indirectly).

like image 462
Jim Ferrans Avatar asked Jan 23 '23 12:01

Jim Ferrans


1 Answers

Assuming you have a bank of application servers upstream delivery data to the users.

upstream webservices {
    server 10.0.0.1:80;
    server 10.0.0.2:80;
    server 10.0.0.3:80;
}
server {
    ... default nginx stuff ...
    location /dynamic_content {
          memcached_pass localhost:11211;
          default_type   text/html;
          error_page     404 502 = @dynamic_content_cache_miss;
          set $memcached_key $uri;
    }
    location @dynamic_content_cache_miss {
          proxy_pass http://webservices;
    }

What the above nginx.conf snippet does is direct all traffic from http://example.com/dynamic/* DIRECTLY to memcached server. If memcache has the content your upstream servers will not see ANY traffic.

If the cache hit fails with a 404 or 502 error (not in cache or memcache cannot be reached) then nginx will pass the request to the upstream servers. Since there are three servers in the upstream definition you also get transparent load balancing proxy as well.

Now the only caveat is that you have to make sure that your backend application servers keep the data in memcache fresh. I use nginx + memcached + web.py to create simple little systems that handle thousands of requests per minute on relatively modest hardware.

The general pseudo code for the application server is like this for web.py

class some_page:
     def GET(self):
         output = 'Do normal page generation stuff'
         web_url = web.url().encode('ASCII')
         cache.set(web_url, str(output), seconds_to_cache_content)
         return output

The important things to remember in the above web.py / pseudo code is that content coming from memcached via nginx cannot be changed at all. nginx is using simple strings and not unicode. If you store unicode output in memcached, you'll get at the very least weird characters at the start and end of your cached content.

I use nginx and memcached for a sports related website where we get huge pulses of traffic that only last for a few hours. I could not get by without nginx and memcached. Server load during our last big Fourth of July sports event dropped from 70% to 0.6% after implementing the above changes. I can't recommend it enough.

like image 92
Great Turtle Avatar answered Apr 28 '23 15:04

Great Turtle