Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many threads should I use in my Java program?

I recently inherited a small Java program that takes information from a large database, does some processing and produces a detailed image regarding the information. The original author wrote the code using a single thread, then later modified it to allow it to use multiple threads.

In the code he defines a constant;

//  number of threads
public static final int THREADS =  Runtime.getRuntime().availableProcessors();

Which then sets the number of threads that are used to create the image.

I understand his reasoning that the number of threads cannot be greater than the number of available processors, so set it the the amount to get the full potential out of the processor(s). Is this correct? or is there a better way to utilize the full potential of the processor(s)?

EDIT: To give some more clarification, The specific algorithm that is being threaded scales to the resolution of the picture being created, (1 thread per pixel). That is obviously not the best solution though. The work that this algorithm does is what takes all the time, and is wholly mathematical operations, there are no locks or other factors that will cause any given thread to sleep. I just want to maximize the programs CPU utilization to decrease the time to completion.

like image 499
Andrew Avatar asked Sep 24 '08 23:09

Andrew


People also ask

How many threads should I use Java?

You can have hundreds of threads, but if the vast majority of them are just waiting, they won't incur much overhead. On the other hand, if you are doing computation, I would limit the number of threads to the number of cores. You won't get any extra pure computation speed from scheduling more.

How do you decide how many threads to use?

Ideally the total thread count for all the jobs should be the number of cores of the system, except on systems that support hyper-threading, in which it should be twice the number of cores. So if the system doesn't have hyper-threading, there are 8 calculations running, each should run in one thread.

How many threads can a Java program run?

Each JVM server can have a maximum of 256 threads to run Java applications. In a CICS region you can have a maximum of 2000 threads. If you have many JVM servers running in the CICS region (for example, more than seven), you cannot set the maximum value for every JVM server.

How many threads can I use for my program?

On Windows machines, there's no limit specified for threads. Thus, we can create as many threads as we want, until our system runs out of available system memory.


4 Answers

Threads are fine, but as others have noted, you have to be highly aware of your bottlenecks. Your algorithm sounds like it would be susceptible to cache contention between multiple CPUs - this is particularly nasty because it has the potential to hit the performance of all of your threads (normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations).

Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm: Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it's own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.

Splitting the data into it's own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.

The best piece of advice I can give you is test, test, test. Don't make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).

like image 154
Kevin Day Avatar answered Oct 24 '22 23:10

Kevin Day


On the one hand, you'd like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there's nothing to run it?

The detail boils down to "what are the threads doing". A thread that's idle waiting for a network packet or a disk block is CPU time wasted.

If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single "read the DB" thread that feeds the other threads, and a single "Dump the data" thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.

The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.

like image 45
Will Hartung Avatar answered Oct 24 '22 23:10

Will Hartung


The number that your application needs; no more, and no less.

Obviously, if you're writing an application which contains some parallelisable algorithm, then you can probably start benchmarking to find a good balance in the number of threads, but bear in mind that hundreds of threads won't speed up any operation.

If your algorithm can't be parallelised, then no number of additional threads is going to help.

like image 32
Rob Avatar answered Oct 24 '22 23:10

Rob


Yes, that's a perfectly reasonable approach. One thread per processor/core will maximize processing power and minimize context switching. I'd probably leave that as-is unless I found a problem via benchmarking/profiling.

One thing to note is that the JVM does not guarantee availableProcessors() will be constant, so technically, you should check it immediately before spawning your threads. I doubt that this value is likely to change at runtime on typical computers, though.

P.S. As others have pointed out, if your process is not CPU-bound, this approach is unlikely to be optimal. Since you say these threads are being used to generate images, though, I assume you are CPU bound.

like image 34
Derek Park Avatar answered Oct 25 '22 00:10

Derek Park