When a reverse proxy is used primarily for load balancing, it is obvious why the routing of requests to a pool of N proxied servers should help balance the load.
However, once the server-side computations for the requests are complete and it's time to dispatch the responses back to their clients, how come the single reverse proxy server never becomes a bottleneck?
My intuitive understanding of the reverse proxy concept tells me,
that the reverse proxy server that is proxying N origin servers behind it would obviously NOT become a bottleneck as easily or as early as a setup involving a single-server equivalent of the N proxied servers, BUT it too would become a bottleneck at some point because all N proxied servers' responses are going through it.
that, to delay the above sort of a bottleneck point (from being reached) even further, the N proxied servers should really be dispatching the responses directly to the client 'somehow', instead of doing it via the single reverse proxy sitting in front of them.
Where am I amiss in my understanding of the reverse proxy concept? Maybe point #2 is by definition NOT a reverse proxy server setup, but keeping definitions aside, why #2 is not popular relative to the reverse proxy option?
A reverse proxy, when used for load-balancing, will proxy all traffic to the pool of origin servers. This means that the client TCP connection terminates at the LB (the reverse proxy), and the LB initiates a new TCP connection to one of the origin nodes on behalf of the client. Now the node, after having processed the request, cannot communicate to the client directly, because client TCP connection is open with the Load Balancer's IP. The client is expecting a response from LB, and not from any other random dude, or a random IP (-: of some node. Thus, the response usually flows the same way as the request, via the LB. Also, you do not want to expose the node's IP to the client. This all usually scales very well for request-response systems. So my answer to #1 is: the LB usually scales well for request-response systems. If at all required, more LBs can be added to create redundancy behind a VIP.
Now, having said this, it still makes sense to bypass the LB for writing responses if your responses are huge. For example, if you are streaming videos in response, then you probably don;t want to choke your LB with humongous responses. In such a scenario, one would configure a Direct Server Return LB. This is essentially what you are thinking of in #2. This allows responses to flow directly from origin servers, bypassing the LB, and still hiding the IP of origin nodes from clients. This is achieved by configuring the ARP in a special way, such that the responses written by origin nodes carry the IP of LB. This is not straight forward to setup, and the usual proxy mode of LB is fine for most use cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With