I understand that if your program has large segments that can be executed in parallel it would be beneficial to spawn new threads when the instances are not bound by a single resource. Example of this would be a web server issuing page requests.
Threads are beneficial from the aspect that inter-thread communication is much less costly and context switching is much faster.
Processes give you more security from the aspect that one process cannot "mess" with another processes' contents, whereas if one thread crashes it is likely all threads will crash within said process.
My question is, what are some examples as to when you would want to use a process (for example by fork() in C)?
I can think of if you have a program that wants to launch another program it would make sense to encapsulate that in a new process, but I feel that I am missing some larger reason for starting a new process.
Specifically, when does it make sense to have one program spawn a new process vs thread?
On a multiprocessor system, multiple threads can concurrently run on multiple CPUs. Therefore, multithreaded programs can run much faster than on a uniprocessor system. They can also be faster than a program using multiple processes, because threads require fewer resources and generate less overhead.
When it comes to processes, the OS usually protects them from one another. Even if one of them corrupts its own memory space, other processes are not affected. Another benefit of using processes over threads is that they can run on different machines. On the other hand, threads normally have to run on the same machine.
You'd prefer multiple threads over multiple processes for two reasons: Inter-thread communication (sharing data etc.) is significantly simpler to program than inter-process communication. Context switches between threads are faster than between processes.
Threads in a process can execute different parts of the program code at the same time. They can also execute the same parts of the code at the same time, but with different execution state: They have independent current instructions; that is, they have (or appear to have) independent program counters.
Main reason for using processes is so that the process can crash or go crazy, and the OS will limit the effect that this has on other processes. So for example Firefox has recently started running plugins in separate processes, IIRC Chrome runs different pages in different processes, and web servers for a long time have handled individual requests in separate processes.
There are a few different ways in which OSes apply limits:
Another potential reason for using processes is that it makes it easier to reason about your code. In multi-threaded code you rely on invariants of all your classes to deduce that access to a particular object is serialized: if your code isn't multi-threaded then you know that it is[*]. It's possible to do this with multi-threaded code as well, of course, just make sure you know what thread "owns" each object, and never access an object from a thread that isn't its owner. Process boundaries enforce this rather than just designing for it. Again, not certain that this is the motivation, but for example the World Community Grid client can use multiple cores. In that mode it runs multiple processes with a completely different task in each, so it has the performance benefits of the additional cores, without any individual task needing to be parallelizable, or the code for any task needing to be thread-safe.
[*] well, as long as it wasn't created in shared memory. You also need to avoid unexpected recursive calls and the like, but that's usually a simpler problem than synchronizing multi-threaded code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With