Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ThreadPool callbacks in tight loop - 100% CPU

Tags:

c#

cpu

threadpool

I have a method in my algorithm that runs a very tight loop on a very large set of data. I originally wrote it single-threaded which was fine, but it took a long time. I am to the point now of wanting to speed it up, so I am now using ThreadPool to parallelize the work. The problem is that this causes my CPU usage to go to 95-100%, which I sort of expected. However, my performance has increased dramatically, but I think I could make it better if I could cut down on all the context switching. This also causes my other programs to be a bit laggy since they have to fight the threads for CPU resources.

My question is how should I go about doing this? The only thing I've been able to think of is to limit the number of threads running at one time, but this may make my algorithm slower since only a few threads will be able to run at one time. I don't want to add sleeps in my threads either since I just need the algorithm to run to completion as quickly as possible.

EDIT: Several people have mentioned using the TPL. I think that is a great idea, but unfortunately I forgot to mention that I am stuck using .NET 3.5 since the parent application hasn't released a version using .NET 4 yet.

like image 871
Nathan Phetteplace Avatar asked Apr 13 '12 14:04

Nathan Phetteplace


2 Answers

This is all about resource management. Your program is currently hogging all the resources, and so other programs get reduced access to them. You need to balance the "I just need the algorithm to run to completion as quickly as possible" part with the "This also causes my other programs to be a bit laggy since they have to fight the threads for CPU resources". They are mutually exclusive; you cannot have your app run as fast as it possibly can on a particular machine and also keep other apps perfectly responsive. There is simply a limit to how much the CPU can do in any stretch of time.

As far as efficiency gains, there are a few things you can do:

  • Don't use the ThreadPool for ultra-optimized threaded algorithms. The ThreadPool is excellent for simple "Go off and do this and let me know you're done" operations. However, if you're looking to optimize, the overhead inherent in adding an additional level of thread scheduling with the ThreadPool (on top of the overhead inherent in the CPU and OS) can be avoided. You also have more limited control over the threads in a ThreadPool, meaning optimizations like assigning processor affinity (to load-balance) and priority (to give a thread more or less time) of individual threads are not available. Try creating simple Threads, or looking into the TPL which has a number of strategies to get multiple things done (not all of which require threading in the first place).

  • Yes, you'll want to be able to "throttle" the number of threads. This is both to allow other programs some CPU time by reducing your program's need for it, but as I said, there's also overhead inherent in multithreading. The rule of thumb is that if a CPU is given more than double the count of actively running threads as it has "execution units" (these are the physical cores on a CPU chip, and "logical processors" like HyperThreading technology that splits one core into two), then the OS will spend more time scheduling threads and switching between them ("cache-thrashing") than it will spend actually running the threads. In more general terms, there is a law of diminishing returns, which will progress into "diseconomies of scale"; eventually, adding another thread will cause your program to run more slowly than if you hadn't used that thread. Yes, the ThreadPool handles maximum threads for you, but that's probably the simplest of its various features to implement yourself in your own algorithm.

  • Make sure that each thread's work is optimized. Look for naive or inefficient algorithms (I call them "O(My God)-complexity") and streamline them. There is a lower limit to the efficiency of most operations (it varies by the type of operation), and "premature optimization is the root of all evil" (don't optimize performance at the expense of making the code actually work), but understand that in a multithreaded environment, any gain you can make on the efficiency of an algorithm when run one time will be multiplied by the number of times you're running it, so making sure a parallel operation is efficient is a double bonus.

like image 199
KeithS Avatar answered Oct 16 '22 10:10

KeithS


If you can rewrite your main application into a foreach loop over an IEnumerable you can use PLINQ to parallelize your loop. You can the use WithDegreeOfParallelism to control how many cores your application will use. You can prevent some of the "lag" you experience by not using all cores on your computer. Also, you don't have to deal with how to partition your loop across threads to avoid unecessary resource contention. PLINQ does all that for you.

Assuming you have this very simple single-threaded loop:

var arrayOfStuff = new[] { ... };
for (var i = 0; i < arrayOfStuff.Length; ++i)
  DoSomething(arrayOfStuff[i]);

If ordering doesn't matter you can parallelize it using PLINQ using one core less than is available:

var cores = Math.Max(1, Environment.ProcessorCount - 1);
arrayOfStuff.AsParallel().WithDegreeOfParallelism(cores).ForAll(DoSomething);

Even if your main loop is more complex you can rewrite it into an iterator block that you then can parallelize:

IEnumerable<Stuff> GetStuff() {
  for ( ... very complex looping ... ) {
    ...
    yield return stuff;
  }
}
like image 27
Martin Liversage Avatar answered Oct 16 '22 12:10

Martin Liversage