Right now I'm testing an extremely simple Semaphore in one of my production regions in AWS. On deployment the latency jumped from 150ms to 300ms. I assumed latency would occur, but if it could be dropped that would be great. This is a bit new to me so I'm experimenting. I've set the semaphore to allow 10000 connections. That's the same number as the maximum number of connections Redis is set to. Is the code below optimal? If not can someone help me optimize it, if I doing something wrong etc. I want to keep this as a piece of middleware so that I can simply call it like this in on the server n.UseHandler(wrappers.DoorMan(wrappers.DefaultHeaders(myRouter), 10000))
.
package wrappers
import "net/http"
// DoorMan limit requests
func DoorMan(h http.Handler, n int) http.Handler {
sema := make(chan struct{}, n)
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sema <- struct{}{}
defer func() { <-sema }()
h.ServeHTTP(w, r)
})
}
The solution you outline has some issues. But first, let's take a small step back; there are two questions in this, one of them implied:
What it sounds like you want to do is actually the second, to prevent too many requests from hitting Redis. I'll start by addressing the first one and then make some comments on the second.
If you really do want to rate limit inbound connections "at the door", you should normally never do that by waiting inside the handler. With your proposed solution, the service will keep accepting requests, which will queue up at the sema <- struct{}{}
statement. If the load persists, it will eventually take down your service, either by running out of sockets, memory, or some other resource. Also note that if your request rate is approaching saturation of the semaphore, you would see an increase in latency caused by goroutines waiting at the semaphore before handling the request.
A better way to do it is to always respond as quickly as possible (especially when under heavy load). This can be done by sending a 503 Service Unavailable
back to the client, or a smart load balancer, telling it to back off.
In your case, it could for example look like something along these lines:
select {
case sema <- struct{}{}:
defer func() { <-sema }()
h.ServeHTTP(w, r)
default:
http.Error(w, "Overloaded", http.StatusServiceUnavailable)
}
If the reason for the rate limit is to avoid overloading a backend service, what you typically want to do is rather to react to that service being overloaded and apply back pressure through the request chain.
In practical terms, this could mean something as simple as putting the same kind of semaphore logic as above in a wrapper protecting all calls to the backend, and return an error through your call chain of a request if the semaphore overflows.
Additionally, if the backend sends status codes like 503
(or equivalent), you should typically propagate that indication downwards in the same way, or resort to some other fallback behaviour for handling the incoming request.
You might also want to consider combining this with a circuit breaker, cutting off attempts to call the backend service quickly if it seems to be unresponsive or down.
Rate limiting by capping the number of concurrent or queued connection as above is usually a good way to handle overload. When the backend service is overloaded, requests will typically take longer, which will then reduce the effective number of requests per second. However, if, for some reason, you want to have a fixed limit on number of requests per second, you could do that with a rate.Limiter
instead of a semaphore.
The cost of sending and receiving trivial objects on a channel should be sub-microsecond. Even on a highly congested channel, it wouldn't be anywhere near 150 ms of additional latency only to synchronise with the channel. So, assuming the work done in the handler is otherwise the same, whatever your latency increase comes from it should almost certainly be associated with goroutines waiting somewhere (e.g. on I/O or to get access to synchronised regions that are blocked by other goroutines).
If you are getting incoming requests at a rate close to what can be handled with your set concurrency limit of 10000, or if you are getting spikes of requests, it is possible you would see such an increase in average latency stemming from goroutines in the wait queue on the channel.
Either way, this should be easily measurable; you could for example trace timestamps at certain points in the handling pathway. I would do this on a sample (e.g. 0.1%) of all requests to avoid having the log output affect the performance.
I'd use a slightly different mechanism for this, probably a worker pool as described here:
https://gobyexample.com/worker-pools
I'd actually say keep 10000 goroutines running, (they'll be sleeping waiting to receive on a blocking channel, so it's not really a waste of resources), and send the request+response to the pool as they come in.
If you want a timeout that responds with an error when the pool is full you could implement that with a select
block as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With