Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single thread pool vs one thread pool per task

I want to use concurrency in Java to make requests to an online API, download and parse the response documents, and load the resulting data into a database.

Is it standard to have one pool of threads in which each thread requests, parses, and loads? In other words, only one class implements Runnable. Or is it more efficient to have, say, three different pools of threads, with the first pool of threads making the requests and pushing them to a queue, the second pool of threads polling from the first queue, parsing, and pushing the parsed data to a second queue, and finally the third pool polling the data from the second queue and loading into the database? In this case, I'd write three different classes that implement Runnable.

like image 511
user1660310 Avatar asked Dec 26 '22 18:12

user1660310


2 Answers

You have to consider which parts of the processing will benefit from parallelism. The online API communication will most likely be a candidate, since there will be sockets and network waits involved. Likewise with the DB interaction. Multithreaded parsing will probably only improve performance if there are multiple available CPU cores.

Splitting the entire process into 3 separate classes will definitely increase the cohesion, meaning each class will have less responsibilities, which is a good thing. On the other hand, making each of these classes a Runnable and having several queues will increase the complexity (possibly unecessarily) of the application.

I would suggest making 3 separate classes, but dont make them Runnable. Then make a Runnable that contains and orchestrates the 3 classes, that is one single thread pool. If you see that this doesnt seem to be fast enough (and after some profiling), try splitting the runnable into 2 thread pools: a download and parse, and a db access.

The point being, start simple and add complexity as needed.

like image 160
Brady Avatar answered Jan 09 '23 23:01

Brady


One important thing to consider: does the order of the processing matter? i.e., is it important that the parsed result from the first download request gets loaded into the DB before the results from the second request?

If so, you really need queues (or similar), one per task. In effect, three single-threaded thread "pools" (or use an ExecutorService).

If not, @Brady makes good points. Unlike him, I'd probably make all three classes Runnable, but that doesn't mean you have to use three queues, you could still try a single pool and profile to see how it is working.

like image 44
user949300 Avatar answered Jan 09 '23 21:01

user949300