Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which method of caching is the fastest/lightest for Node/Mongo/NginX?

Tags:

I've been tasked to work on a project for a client that has a site which he is estimating will receive 1-2M hits per day. He has an existing database of 58M users that need to get seeded on a per-registration basis for the new brand. Most of the site's content is served up from external API supplied data with most of the data stored on our Mongo setup being profile information and saved API parameters.

NginX will be on port 80 and load balancing to a Node cluster on ports 8000 - 8010.

My question is what to do about caching. I come from a LAMP background so I'm used to either writing static HTML files with PHP and serving those up to minimize MySQL load or using Memcached for sites that required a higher level of caching. This setup is a bit foreign to me.

Which is the most ideal as far as minimal response time and CPU load?

1: Page-level caching with NginX

Reference: http://andytson.com/blog/2010/04/page-level-caching-with-nginx/

server {
    listen            80;
    servername        mysite.com;

    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  Host       $host;

    location / {
        proxy_pass    http://localhost:8080/;
        proxy_cache   anonymous;
    }

    # don't cache admin folder, send all requests through the proxy
    location /admin {
        proxy_pass    http://localhost:8080/;
    }

    # handle static files directly. Set their expiry time to max, so they'll
    # always use the browser cache after first request
    location ~* (css|js|png|jpe?g|gif|ico)$ {
        root          /var/www/${host}/http;
        expires       max;
    }
}


2: Redis as a cache bucket

The hash() function is the numbers() function on this page: http://jsperf.com/hashing-strings

function hash(str) {
    var res = 0,
        len = str.length;
    for (var i = 0; i < len; i++) {
        res = res * 31 + str.charCodeAt(i);
    }
    return res;
}

var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key    = hash(apiUrl).toString(); // 1.8006908172911553e+136

myRedisClient.set(key,theJSONresponse, function(err) {...});


3: Node write JSON files

The hash() function is the numbers() function on this page: http://jsperf.com/hashing-strings

function hash(str) {
    var res = 0,
        len = str.length;
    for (var i = 0; i < len; i++) {
        res = res * 31 + str.charCodeAt(i);
    }
    return res;
}

var fs     = require('fs');
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key    = hash(apiUrl).toString(); // 1.8006908172911553e+136

fs.writeFile('/var/www/_cache/' + key + '.json', theJSONresponse, function(err) {...});


4: Varnish in front

I did some research and benchmarks like the ones shown on this site are leaning me away from this solution, but I'm still open to consider it if it makes the most sense: http://todsul.com/nginx-varnish

like image 256
Maverick Avatar asked Mar 21 '13 18:03

Maverick


People also ask

Why use NGINX over Node?

Having a web server like Nginx read static content from disk is going to be faster than Node. js as well. Even clustering can sometimes be more efficient as a reverse proxy like Nginx will use less memory and CPU than that of an additional Node.

What is caching in node JS?

Caching is the process of storing data in a high-speed storage layer so that future requests for such data can be fulfilled much faster than is possible through accessing its primary storage location.


2 Answers

I would do a combination, and use Redis to cache session user API calls that have a short TTL, and use Nginx to cache long term RESTless data and static assets. I wouldn't write JSON files as I imagine the file system IO would be the slowest and most CPU intensive of the options listed.

like image 178
AlienWebguy Avatar answered Oct 25 '22 22:10

AlienWebguy


  1. nginx page-level caching is good for caching static content. But for dynamic content, it's no good. For example, how do you invalidate cache if the content is changed in the upstream?

  2. Redis is perfect for in-memory data store. But I don't like to use it as cache. With limited amount of memory, I have to constantly worry about running out of memory. Yes you can set up strategy for expiring keys in redis. But that's extra work and still not as good as I want it to be a cache provider.

Have no experience on choices 3 and 4.

I'm surprised that you don't include memcache here as an option. From my experience, it's solid as a cache provider. One memcache feature that redis doesn't have is that it doesn't guarantee that a key won't be expired by the expiry time you specified. This is bad for a data store, but it makes memcache a perfect candidate for caching: you don't need to worry about using up memory you assigned to memcache. memcache will delete less used keys (the cache less used) even though the expiry time of those keys are not met yet.

Nginx provides this build-in memcache module. It's solid. A number of tutorials if you google online.

This one I like the most (see link below). Cache invalidation is easy: for example, if a page is updated in upstream, just delete the memcache key from the upstream app server. The author claimed 4x increase of the response time. Believe it's good enough for your use case.

http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/

like image 21
Chuan Ma Avatar answered Oct 25 '22 21:10

Chuan Ma