Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading vs. Multi-Instancing - Which to choose?

Will it be a big difference between this two scenarious:

  1. one instance of application creates 100 threads to process some jobs
  2. 10 instances of the same application creates 10 threads each to process jobs (total 100)

The number of threads in both cases will be the same. Is it some performance or any type of improvements of one over another?

Instance - is for example console application, so in second case it will be 10 console application running. each of application has it's own folder.

like image 693
Yaplex Avatar asked Feb 27 '12 22:02

Yaplex


3 Answers

A thread uses less resources than a process so theoretically option 1 would be "better". However, you probably won't notice much difference between the two, because 100 separate threads or processes all running simultaneously and fighting for the same O/S resources is pretty much guaranteed to grind your system to a halt.

I would choose option 3 - one process containing a fairly small thread pool. That way, some jobs will execute simultaneously and the rest will queue up and wait their turn. This approach also scales well if a very large number of jobs are going to be run.

See the ThreadPool class, or preferably, one of the many higher-level abstractions on top of it (e.g. the task library, or even plain old asynchronous delegates).

like image 153
Christian Hayter Avatar answered Nov 18 '22 12:11

Christian Hayter


Option 2 has (at least) the following overheads:

  • More process objects
  • More per process static memory
  • More instances of the CLR and jitted code
  • Context switches need to switch the address space (very expensive)
  • Less opportunities to share application data structures
  • You need cross-process communication. Simple method calls become IPC operations
  • More work for you
  • More opportunities for bugs (IPC communication, fork bombs, ...)
  • Worse debuggability
  • No built-in load-balancing through the thread-pool
  • Harder and more error-prone synchronization. Less built-in stuff and slower

Why would you choose (2) if you can choose (1)? There are valid reasons, but those are rather special:

  • You need to be able to tolerate arbitrary memory corruption. This is not normal the case (not at all!)
  • You need the ability to kill threads. This cannot be done reliably inside the CLR for single threads. But you could do it cooperatively which is usually the better option anyway
  • Your threads need to run under different users and such. This almost never happens.

In general, the less processes the better.

like image 3
usr Avatar answered Nov 18 '22 12:11

usr


It depends on what you are doing, but in most cases Option 1 will have the best performance and will be the easiest to work with.
To give you a more complete answer I would need to know the following:

  • Are the 100 Threads all performing the same task?
  • Are the 100 Threads accessing the same data?
  • Are the tasks handled by the threads going to have natural down time (waiting for another process to finish or a resource to become available)?
  • Are the tasks handled by the threads going to all try to access a limited resource (like the hard disk or network card)?
  • How many simultaneous threads can your computer handle at one time (for example a 4 core processor with Hyper-Threading could handle 8 threads, a 4 core processor without Hyper-Threading could handle 4 threads)?
  • What happens if something goes wrong on a thread? Does the process crash, is the thread restarted?

If the threads are all performing the same task, keeping them together will make is easier on the end user and later developers, as everything is in one place.

If the threads are all accessing the same data then keeping them in the same process will allow you to share that data between threads (though watch out for race conditions when changing the data) and reduce the memory foot print. You also might be able to team up the threads to access data from the same blocks, so everything can be cached on the CPU, reducing the effect of memory latency, though this is not something I would recommend attempting.

Since many of the answers are giving advice on how to implement your project, knowing if each thread is designed to fully use the CPU all the time it is running or if these are background tasks which do a small amount of work before going back to sleep will help us make suggestions correct for your situation.

Knowing what hardware the process will be running on will help us provide suggestions correct for your situation.

If a thread fails, what happens? If a thread fails once a day, is a user required to intervene, stop the process, and restart it? If so then any unsaved work done on the other threads would be lost. In this case, having each thread run in its own process would give you the benefit of only losing the process which failed.


Christian Hayter's option 3 makes sense, but is not always relevant with C#.
If you look at the documentation, it states:

An operating-system ThreadId has no fixed relationship to a managed thread, because an unmanaged host can control the relationship between managed and unmanaged threads. Specifically, a sophisticated host can use the CLR Hosting API to schedule many managed threads against the same operating system thread, or to move a managed thread between different operating system threads.

Basically this means the .Net framework will pool your threads if it feels like it would be a good idea. This is more likely to happen if your processes is using more threads, while the number of total threads would probably stay pretty similar between multi-threaded processes. As a result I would expect the 1 process, 100 threads solution to use fewer total threads then the 10 processes, 10 threads each (something like 10 to 40, but you would have to check).

That being said, the framework will be guessing, so in some cases Thread Pools will be a better option. Be sure to read the documentation first, as there are some cases where Thread Pools should not be used. A quick tutorial on Pools can be found on MSDN. There is also a thread which discuses when to use Thread Pools.


If you provide more information then I will attempt to give a more accurate answer. Otherwise option 1 (and possibly option 3) are the better choices in most situations.

like image 2
Trisped Avatar answered Nov 18 '22 12:11

Trisped