I am working on an application which needs to test 1000's of proxy servers continuously. The application is based around Spring Boot.
The current approach I am using is @Async decorated method which takes a proxy server and returns the result.
I am often getting OutOfMemory error and the processing is very slow. I assume that is because each async method is executed in a separate thread which blocks on I/O?
Everywhere I read about async in Java, people mix parallel execution in threads with non-blocking IO. In the Python world, there is the async library which executes I/O requests in a single thread. While a method is waiting for a response from server, it starts executing other method.
I think in my case, I need something like this because Spring's @Async is not suitable for me. Can someone please help remove my confusion and suggest me how should I go about this challenge?
I want to check 100's of proxies simultaneously without putting excessive load. I have read about Apache Async HTTP Client but I don't know if it is suitable?
This is the thread pool configuration I am using:
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
executor.setMaxPoolSize(100);
executor.setDaemon(true);
return executor;
}
I am often getting OutOfMemory error and the processing is very slow. I assume that is because each async method is executed in a separate thread which blocks on I/O?
For the OOME, I explain it in the second point.
About the slowness, it is indeed related to I/O performed in the request/response processings.
The problem comes from the number of thread running effectively in parallel.
With your actual configuration, the number of pool max is never reached (I explain why below).
Supposing that corePoolSize==10
in your case. It means that 10 threads run in parallel. Suppose each thread runs about 3 seconds to test the site.
It means that you test a site in about 0.3 second. To test 1000 sites it makes 300 seconds.
It is slow enough and an important part of the time is waiting time : I/O to send/receive request/response from the site currently tested.
To increase the overall speed, you should probably run in parallel initially much more threads than your core capacity. In this way, I/O waiting time will be less a problem since the scheduling between the threads will be frequent and so you would have some I/O processings without value for the threads while these are paused.
It should handle the OOME issue and probably improve strongly the execution time, but well no guarantee that you get a very short time.
To achieve it you should probably work the multi-threading logic more finely and rely on API/libraries with non blocking IO.
Some information of the official documentation that should be helpful.
This part explains the overall logical when a task is submitted (emphasis is mine):
The configuration of the thread pool should also be considered in light of the executor’s queue capacity. For the full description of the relationship between pool size and queue capacity, see the documentation for ThreadPoolExecutor. The main idea is that, when a task is submitted, the executor first tries to use a free thread if the number of active threads is currently less than the core size. If the core size has been reached, the task is added to the queue, as long as its capacity has not yet been reached. Only then, if the queue’s capacity has been reached, does the executor create a new thread beyond the core size. If the max size has also been reached, then the executor rejects the task.
And this explains the consequences on the queue size (emphasis is still mine):
By default, the queue is unbounded, but this is rarely the desired configuration, because it can lead to OutOfMemoryErrors if enough tasks are added to that queue while all pool threads are busy. Furthermore, if the queue is unbounded, the max size has no effect at all. Since the executor always tries the queue before creating a new thread beyond the core size, a queue must have a finite capacity for the thread pool to grow beyond the core size (this is why a fixed-size pool is the only sensible case when using an unbounded queue).
Long story short : you didn't set the queue size that by default is unbounded (Integer.MAX_VALUE
). So you fill the queue with several hundreds of tasks that will be pop only much later. These tasks use much memory, whereas the OOME
risen.
Besides, as explained in the documentation, this setting is helpless with an unbounded queue because only when the queue is full a new thread would be created :
executor.setMaxPoolSize(100);
Setting both information with relevant values make more sense :
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
executor.setMaxPoolSize(100);
executor.setQueueCapacity(100);
executor.setDaemon(true);
return executor;
}
Or as alternative use a fixed-size pool with the same value for initial and max pool size :
Rather than only a single size, an executor’s thread pool can have different values for the core and the max size. If you provide a single value, the executor has a fixed-size thread pool (the core and max sizes are the same).
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(100);
executor.setMaxPoolSize(100);
executor.setDaemon(true);
return executor;
}
Note also that invoking 1000 times the asynch service without pause seems harmful in terms of memory since it cannot handle them straightly. You should probably split these invocations into smaller parts (2, 3 or more) by performing thread.sleep() between them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With