I've been tasked to work on a project for a client that has a site which he is estimating will receive 1-2M hits per day. He has an existing database of 58M users that need to get seeded on a per-registration basis for the new brand. Most of the site's content is served up from external API supplied data with most of the data stored on our Mongo setup being profile information and saved API parameters.
NginX will be on port 80 and load balancing to a Node cluster on ports 8000 - 8010.
My question is what to do about caching. I come from a LAMP background so I'm used to either writing static HTML files with PHP and serving those up to minimize MySQL load or using Memcached for sites that required a higher level of caching. This setup is a bit foreign to me.
Which is the most ideal as far as minimal response time and CPU load?
Reference: http://andytson.com/blog/2010/04/page-level-caching-with-nginx/
server {
listen 80;
servername mysite.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
location / {
proxy_pass http://localhost:8080/;
proxy_cache anonymous;
}
# don't cache admin folder, send all requests through the proxy
location /admin {
proxy_pass http://localhost:8080/;
}
# handle static files directly. Set their expiry time to max, so they'll
# always use the browser cache after first request
location ~* (css|js|png|jpe?g|gif|ico)$ {
root /var/www/${host}/http;
expires max;
}
}
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
myRedisClient.set(key,theJSONresponse, function(err) {...});
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var fs = require('fs');
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
fs.writeFile('/var/www/_cache/' + key + '.json', theJSONresponse, function(err) {...});
I did some research and benchmarks like the ones shown on this site are leaning me away from this solution, but I'm still open to consider it if it makes the most sense: http://todsul.com/nginx-varnish
Having a web server like Nginx read static content from disk is going to be faster than Node. js as well. Even clustering can sometimes be more efficient as a reverse proxy like Nginx will use less memory and CPU than that of an additional Node.
Caching is the process of storing data in a high-speed storage layer so that future requests for such data can be fulfilled much faster than is possible through accessing its primary storage location.
I would do a combination, and use Redis to cache session user API calls that have a short TTL, and use Nginx to cache long term RESTless data and static assets. I wouldn't write JSON files as I imagine the file system IO would be the slowest and most CPU intensive of the options listed.
nginx page-level caching is good for caching static content. But for dynamic content, it's no good. For example, how do you invalidate cache if the content is changed in the upstream?
Redis is perfect for in-memory data store. But I don't like to use it as cache. With limited amount of memory, I have to constantly worry about running out of memory. Yes you can set up strategy for expiring keys in redis. But that's extra work and still not as good as I want it to be a cache provider.
Have no experience on choices 3 and 4.
I'm surprised that you don't include memcache here as an option. From my experience, it's solid as a cache provider. One memcache feature that redis doesn't have is that it doesn't guarantee that a key won't be expired by the expiry time you specified. This is bad for a data store, but it makes memcache a perfect candidate for caching: you don't need to worry about using up memory you assigned to memcache. memcache will delete less used keys (the cache less used) even though the expiry time of those keys are not met yet.
Nginx provides this build-in memcache module. It's solid. A number of tutorials if you google online.
This one I like the most (see link below). Cache invalidation is easy: for example, if a page is updated in upstream, just delete the memcache key from the upstream app server. The author claimed 4x increase of the response time. Believe it's good enough for your use case.
http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With