Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

thread pooling for http requests

I have a few questions regarding architecture & performance for concurrency.

Setup:

There is a JavaFX GUI where users can start various Tasks, which are themselves threaded tasks (new Thread(new CustomTask<?>).start();). These tasks do a loop for ~700k HTTP-requests and insert the processed return values if there are around 10k items in the prepared insert statement for the database. Their progress is displayed in the GUI (ObservableList items).

Visualisation

Problem:

These tasks take a long time and the bottleneck seems to be the delay while waiting for the HTTP-response. (DB inserts are done with auto-commit turned off in bulks of 10k prepared insert statements)

Target:

Improve overall performance by having the requests in separate tasks/threads.


Q1:

Is it even reasonable to use threads here? How can I improve the performance in a other way?

Q2:

If threading is reasonable, how do I realize it? I was thinking about having a global thread pool or ExecutorService where the request tasks are queued. When the response is available, it will be written to a synchronized list. If there are 10k+ objects in the list, execute a batch insert.

Q3:

How do I determine a good thread-pool size? How to distinguish threads?

Thread.activeCount() returns 7 (current Thread Group) ManagementFactory.getThreadMXBean().getThreadCount() returns 13 (threads overall?) Runtime.getRuntime().availableProcessors() returns 8

I´ve read a few comments on multithreading and they all said that having more threads than cores does not necessarily improve performance (no “real” concurrency, timeslicing). I don´t know but if I had to guess I´d say the number 13 includes some GUI threads. I can´t seem to wrap my head around how to get a useful number for the ThreadPoolSize.


I appreciate any hint on how to improve my application.

like image 793
Maze Avatar asked Sep 29 '22 06:09

Maze


2 Answers

Q1

First it is unclear what you need to respond back to the client with. Do you have to talk to the database to send back a response?

If you don't it is the Pub/Sub pattern (ie its fire and forget) then message queues or any pub/sub systems are ideal and scale far better than using a plain ExecutorService. Some examples are AMQP, JMS, Redis Pub/Sub and many many more.

You can do pub/sub with a reply to the client but this generally requires nonblocking client connections like WebSockets, Comet and is fairly complicated to setup.

Q2 and Q3

If your problem is you DO need to reply to the client then it follows the Request/Reply pattern which is a harder problem to scale.

Some libraries that do this well are on the JVM are Hystrix which follows the command and bulkhead pattern which gives configurable and fault tolerant Request/Reply along with request collapsing which I believe solves your: "If there are 10k+ objects in the list, execute a batch insert."

Figuring out the proper pool size is actually fairly complicated for blocking operations. For nonblocking (ie cpu bound or in memory processing) its simply the available processors but this is not the case for you since your connecting to a database and probably using a blocking IO servlet container.

To figure out the proper pool size for blocking operations you will have use metrics and monitoring which Hystrix provides out of the box. You also should be aware of your downstream dependencies. For example if your database can only handle 200 concurrent connections you do not want a thread pool that talks to the database bigger than 200.

like image 92
Adam Gent Avatar answered Oct 06 '22 00:10

Adam Gent


Of course you can use ExecutorService.

I´ve read a few comments on multithreading and they all said that having more threads than cores does not necessarily improve performance (no “real” concurrency, timeslicing)

This is true for processes that don't sleep or wait/block, such as calculating prime numbers or processing images. In your case, HTTP client blocks until response returns, and until it happens thread remains idle. For HTTP request executor pool of size 50-100-200 is okay.

Pattern could be the following:

ExecutorService es = Executors.newFixedThreadPool(50);

// ...
// creating request and response future associated with it
Future<Response> responseFuture = es.submit(new Callable<Response>() {
    @Override
    public Response call() throws Exception {
        // request data by HTTP
        return response;
    }
});
customTask.push(responseFuture);

In a customTask object let's create a single thread service executor which will operate on list of Responses:

// create single pool executor in order to accept responses 
// one by one and at the order they're requested
ExecutorService customTaskService = Executors.newSingleThreadExecutor(); 
List<Response> results = new ArrayList<>();    

// push() method
public void push(final Future<Response> responseFuture) {

     customTaskService.execute(new Runnable() {

         public void run() {
             try {
                 // response.get() will block inside this service thread
                 // though not affecting GUI thread
                 results.add(response.get(TIMEOUT, TimeUnit.SECONDS)); 
             } catch (RuntimeException e) {
                 // processing of a request failed
             }
             if (results.size() > MAX_SIZE) {
                 // make inserts to DB
                 results.clear();
             }
         }
     });
}   
like image 44
Alex Salauyou Avatar answered Oct 06 '22 00:10

Alex Salauyou