We're implementing some code on our homepage that depends on a geoip lookup for every unique vistor and we have a choice of querying a geoip web service (Maxmind) or downloading the database and querying it locally. The server is currently a 512MB Linode instance and only receives around 1500 uniques per day however we are concerned what will happen during a traffic spike.
It's clear to me that the local lookup would be faster but surely under load hitting the database would be a lot more ram/CPU intensive than processing an additional http request. Obviously upgrading the servers memory is easily achieved if going local is the best route.I would imagine that over a certain number of connections per second, the server would be more likely to become unavailable querying its local database than a web api (assuming of course that the web api servers can handle the http requests).
Any input on which solution would be the most robust long term solution to handle the occasional traffic flood without any denial of service would be greatly appreciated.
EDIT: Just to try quantify this a bit further, lets say the server would need to be able to handle up to 200 connections per second. Each connection requires a single DB query or http request.
Given 200 hits per second you definitely want to go with a local database. I doubt that any third party website providing such a service would want to handle that many requests, but even if they did it's likely going to be much slower then local database access.
You will of course want to increase the memory significantly on the local server. Keep in mind that memory is extremely cheap nowadays. For example, about 1 1/2 years ago I build a linux box with 32 gigs of RAM, and the latest CPU for under $1000.
200 hits per second may require a clustered database and multiple servers. You may also want to consider using a key value store for this, which can handle potentially thousands of hits per second.
The best way to find out is to run a load test and see for yourself what performs best. But my gut feeling is that 200 queries per second is not overly much and can easily be handled in whatever way you prefer.
I'd lean towards using a local server with a gis database because local is cheaper (no costs per query) and a single server is easier to update. Also, you can scale up if needed by adding a second server or by configuring a 3rd party server.
I found some actual test numbers for this setup here: http://blog.jcole.us/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/ This is a network setup where four clients query a single server. For a single client he reports 2000 queries / second and latencies below 1ms. Note that he uses the gis features builtin in mysql to reach these numbers. Also, this was 2007 so performance should be even better now.
Make your code modular and configurable. This depends on the programming language of course, but create a LookupService interface so that you can create different implementations if needed without changing the rest of the code. If the implementation calls a web service, make the endpoint easily configurable. If your service uses the same URI scheme and response json as maxmind, you can switch at will from your own local server to maxmind and back again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With