Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many threads does it take to make them a bad choice?

I have to write a not-so-large program in C++, using boost::thread.

The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millons are a possibility as well) number of (possibly) large files. Each file is independent from another, and they all reside in the same directory. I´m thinking of using the multi threaded aproach, but the question is, how many threads should I use? I mean, what order of magnitude? 10, 500, 12400?

There are some synchronization issues, each thread should return a struct of values (which are accumulated for each file), and those are added to a "global" struct to get the overall data. I realize that some threads could "get hungry" because of synchronization, but if it's only an add operation, does it matter?

I was thinking of

for(each file f in directory){

    if (N < max_threads)//N is a static variable controlling amount of threads
         thread_process(f)
    else
       sleep()
}

This is in HP - UX, but I won't be able to test it often, since it's a remote and quite unaccessible server.

like image 892
Tom Avatar asked Sep 17 '09 01:09

Tom


People also ask

Is it bad to have too many threads?

"Is there such a thing as too many threads?" - Yes. Threads consume system resources that you may run out of. Threads need to be scheduled; requires work by the kernel as well as time on the CPU (even if they then deside to do nothing).

What is too many threads?

If your thread usage peaks at 3, then 100 is too much. If it remains at 100 for most of the day, bump it up to 200 and see what happens. You could actually have your code itself monitor usage and adjust the configuration for the next time it starts but that's probably overkill.

What is a good number of threads?

Ideally the total thread count for all the jobs should be the number of cores of the system, except on systems that support hyper-threading, in which it should be twice the number of cores. So if the system doesn't have hyper-threading, there are 8 calculations running, each should run in one thread.

How many threads in Java is too many?

4.2. Windows. On Windows machines, there's no limit specified for threads. Thus, we can create as many threads as we want, until our system runs out of available system memory.


1 Answers

According to Amdahl's law that was discussed by Herb Sutter in his article:

Some amount of a program's processing is fully "O(N)" parallelizable (call this portion p), and only that portion can scale directly on machines having more and more processor cores. The rest of the program's work is "O(1)" sequential (s). [1,2] Assuming perfect use of all available cores and no parallelization overhead, Amdahl's Law says that the best possible speedup of that program workload on a machine with N cores is given by
formula image

In your case I/O operations could take most of the time, as well as synchronization issues. You could count time that will be spend in blocking(?) slow I/O operations and approximately find number of threads that will be suitable for your task.


Full list of concurrency related articles by Herb Sutter could be found here.

like image 147
Kirill V. Lyadvinsky Avatar answered Sep 20 '22 05:09

Kirill V. Lyadvinsky