Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do companies that provide APIs use a shim or proxy in front of their APIs?

I'm researching how large companies manage their public APIs. I'm thinking of companies with mature established APIs such as Google, Facebook, Twitter, and Amazon.

These companies have a number of different APIs that they expose to the public. Google, for example, has Plus, AdSense, AdWords etc. APIs that are publicly consumable. I'd like to understand if they use a cluster of reverse-proxy servers in front of those APIs to provide common functionality so that their specialist API servers don't need to implement that.

For example: Throttling and Authentication could be handled at this layer instead of implementing it in each API cluster.

The questions: Does anyone use a shim or reverse proxy in front of their APIs to handle common tasks? What are the use cases that make a reverse-proxy a good or bad idea for a cluster of API servers?

like image 448
Guy Avatar asked Dec 16 '13 22:12

Guy


1 Answers

Most large companies explore a variety of things to handle the traffic and load on their servers. Roughly speaking:

  1. A load balancer sits between the entry point and the actual client.
  2. A reverse proxy often times sits between these to handle static files, pre-computed/rendered views, and other such largely static assets.
  3. Any cast is used for DNS purposes, so that you are routed towards the nearest server that handles that URL.
  4. Back pressure is employed in systems to limit the amount of requests feeding through a single pipeline and so that services don't tip over.
  5. Memcached, Redis and the like are used as short term caches. That is, if it's going to roughly be the same result every 5 seconds, then that result can be cached in memory for faster delivery. Some proxies can be configured to read out of these.

If you're really interested, start reading some of the Netflix blog. Take a look at some of the open source they've used like Hystrix or Zuul. You can also take a look at some of their videos. They make heavy use of proxies and have built in some very advanced distributed behavior.

As far as a reverse proxy being a good idea, think in terms of failure. If your service calls out to another API by direct route and that service fails, then your service will fail and cascade upwards to the end user. On the other hand, if it's hitting a reverse proxy, then that proxy can be configured or even auto detect failures and divert traffic to back up servers.

As far as a reverse proxy being a good idea, think in terms of load. Sometimes servers can only handle a fraction of the traffic individually so that load must be shared on many servers. This is true not just of CPU capped but also IO capped resources (even if the return signal itself will not be the cause of the IO capping.)

Daisy chaining like this presents its own special little hell but it's sometimes unavoidable. On the downsides and what makes it a really bad choice if you can avoid it at all costs is a loss of deterministic behavior. Sometimes the stupidest things will bring your servers down. And by stupid, I mean, really, really dumb stuff that you never thought in a million years might bite you in the butt (think server clocks out of sync.) You have to start using rolling deploys of code, take down servers manually or forcefully if they stop responding, and keep those proxy configs in good order.

HTTP1.1 support can also be an issue. Not all reverse proxy adhere to the spec. In fact, some of them only cover ~50%. HAProxy does not do SSL. If you're only limited hardware then thread based proxy can unexpectedly swamp the system with threads.

Finally, adding in a proxy is one more thing that will break (not can, will.) You have to monitor them just like any piece of the platform, aggregate their logs, and run mock drills on them too.

like image 96
wheaties Avatar answered Oct 22 '22 13:10

wheaties