You can implement throttling by adding @Throttling annotation to the service method of the request that should be throttled. As you can see @Throttling annotation alone is equivalent to the annotation below with parameters. The default is 1 method call per second.
Best practices to handle throttling The following are best practices for handling throttling: Reduce the degree of parallelism. Reduce the frequency of calls. Avoid immediate retries because all requests accrue against your usage limits.
Throttling allows you to limit how many requests a session can execute concurrently. This is useful if you have multiple applications connecting to the same Cassandra cluster, and want to enforce some kind of SLA to ensure fair resource allocation.
Throttling is a server side response where feedback is provided to the caller indicating that there are too many requests coming in from that client or that the server is overloaded and needs clients to slow down their rate of requests. Rate Limiting is a client side response to the maximum capacity of a channel.
I'd use a ring buffer of timestamps with a fixed size of M. Each time the method is called, you check the oldest entry, and if it's less than N seconds in the past, you execute and add another entry, otherwise you sleep for the time difference.
What worked out of the box for me was Google Guava RateLimiter.
// Allow one request per second
private RateLimiter throttle = RateLimiter.create(1.0);
private void someMethod() {
throttle.acquire();
// Do something
}
In concrete terms, you should be able to implement this with a DelayQueue
. Initialize the queue with M
Delayed
instances with their delay initially set to zero. As requests to the method come in, take
a token, which causes the method to block until the throttling requirement has been met. When a token has been taken, add
a new token to the queue with a delay of N
.
Read up on the Token bucket algorithm. Basically, you have a bucket with tokens in it. Every time you execute the method, you take a token. If there are no more tokens, you block until you get one. Meanwhile, there is some external actor that replenishes the tokens at a fixed interval.
I'm not aware of a library to do this (or anything similar). You could write this logic into your code or use AspectJ to add the behavior.
If you need a Java based sliding window rate limiter that will operate across a distributed system you might want to take a look at the https://github.com/mokies/ratelimitj project.
A Redis backed configuration, to limit requests by IP to 50 per minute would look like this:
import com.lambdaworks.redis.RedisClient;
import es.moki.ratelimitj.core.LimitRule;
RedisClient client = RedisClient.create("redis://localhost");
Set<LimitRule> rules = Collections.singleton(LimitRule.of(1, TimeUnit.MINUTES, 50)); // 50 request per minute, per key
RedisRateLimit requestRateLimiter = new RedisRateLimit(client, rules);
boolean overLimit = requestRateLimiter.overLimit("ip:127.0.0.2");
See https://github.com/mokies/ratelimitj/tree/master/ratelimitj-redis fore further details on Redis configuration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With