Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:

ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();

What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.

I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?

I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

like image 367
powlette Avatar asked Sep 29 '11 13:09

powlette


People also ask

What is the difference between a thread and a process in C#?

A process, in the simplest terms, is an executing program. One or more threads run in the context of the process. A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread.

What is difference between process and thread?

In the world of computer science both process and thread are counted as important terms. Process is the program under action whereas a thread is the smallest segment of instructions that can be handled independently by a scheduler.

What is the relationship between threads and processes?

A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads.

What is the benefit of using thread instead of process?

On a multiprocessor system, multiple threads can concurrently run on multiple CPUs. Therefore, multithreaded programs can run much faster than on a uniprocessor system. They can also be faster than a program using multiple processes, because threads require fewer resources and generate less overhead.


1 Answers

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.

The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.

Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.

One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.

I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.

Using the Task Parallel Library didn't alleviate this problem.

like image 198
Dave Van den Eynde Avatar answered Nov 15 '22 06:11

Dave Van den Eynde