Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Motivation for spawning a new process v thread

I understand that if your program has large segments that can be executed in parallel it would be beneficial to spawn new threads when the instances are not bound by a single resource. Example of this would be a web server issuing page requests.

Threads are beneficial from the aspect that inter-thread communication is much less costly and context switching is much faster.

Processes give you more security from the aspect that one process cannot "mess" with another processes' contents, whereas if one thread crashes it is likely all threads will crash within said process.

My question is, what are some examples as to when you would want to use a process (for example by fork() in C)?

I can think of if you have a program that wants to launch another program it would make sense to encapsulate that in a new process, but I feel that I am missing some larger reason for starting a new process.

Specifically, when does it make sense to have one program spawn a new process vs thread?

like image 823
Bob Avatar asked Apr 30 '11 04:04

Bob


People also ask

What is the benefit of using thread instead of process?

On a multiprocessor system, multiple threads can concurrently run on multiple CPUs. Therefore, multithreaded programs can run much faster than on a uniprocessor system. They can also be faster than a program using multiple processes, because threads require fewer resources and generate less overhead.

When would you choose process over the thread?

When it comes to processes, the OS usually protects them from one another. Even if one of them corrupts its own memory space, other processes are not affected. Another benefit of using processes over threads is that they can run on different machines. On the other hand, threads normally have to run on the same machine.

Why would you choose to use multiple threads versus separate processes in a server?

You'd prefer multiple threads over multiple processes for two reasons: Inter-thread communication (sharing data etc.) is significantly simpler to program than inter-process communication. Context switches between threads are faster than between processes.

What is the importance of threads in processes?

Threads in a process can execute different parts of the program code at the same time. They can also execute the same parts of the code at the same time, but with different execution state: They have independent current instructions; that is, they have (or appear to have) independent program counters.


1 Answers

Main reason for using processes is so that the process can crash or go crazy, and the OS will limit the effect that this has on other processes. So for example Firefox has recently started running plugins in separate processes, IIRC Chrome runs different pages in different processes, and web servers for a long time have handled individual requests in separate processes.

There are a few different ways in which OSes apply limits:

  • Crashes - as you note, if a thread crashes it generally takes down the whole process. This motivates the browser process boundaries: browsers and browser plugins are complex bits of code subject to constant attack, so it makes sense to take unusual precautions.
  • Resource limits. If a thread in your process opens a lot of files, allocates a lot of memory, etc, then it affects you. Another process needn't, because it can be limited separately. So each request in a web server might be more limited in its resource usage than the server as a whole, because you want your server to serve multiple requests simultaneously without any one remote user hogging resources.
  • Capabilities. Varies by OS, but just for example you can run a process in a chroot jail to ensure that it doesn't modify or read files it shouldn't, no matter how vulnerable your code is to exploits. For another example, SymbianOS has an explicit list of permissions to do various things with the system ("read user phonebook", "write user phonebook", "decrypt DRM files" and so on). There's no way to surrender permissions that your process has, so if you want to do something highly sensitive, and then fall back to a low-sensitivity mode, you need a process boundary somewhere. One reason to want to do this is security - unknown code or code that might contain security flaws can be somewhat sandboxed, and a smaller quantity of code that isn't limited can be subjected to increased scrutiny. Another reason is simply to have the OS enforce certain aspects of your design.
  • Drivers. In general, a device driver controls shared access to a unique system resource. As with capabilities, restricting this access to a single driver process means you can forbid it to all the other processes. For example IIRC TrueCrypt on Windows installs a driver that has enhanced permissions that allow it to register an encrypted container with a drive letter and then act like any other Windows filesystem. The GUI part of the app runs in regular user mode. I'm not sure whether filesystem drivers on Windows actually need an associated process, but device drivers in general might do, so even if this isn't a good example hopefully it gives the idea.

Another potential reason for using processes is that it makes it easier to reason about your code. In multi-threaded code you rely on invariants of all your classes to deduce that access to a particular object is serialized: if your code isn't multi-threaded then you know that it is[*]. It's possible to do this with multi-threaded code as well, of course, just make sure you know what thread "owns" each object, and never access an object from a thread that isn't its owner. Process boundaries enforce this rather than just designing for it. Again, not certain that this is the motivation, but for example the World Community Grid client can use multiple cores. In that mode it runs multiple processes with a completely different task in each, so it has the performance benefits of the additional cores, without any individual task needing to be parallelizable, or the code for any task needing to be thread-safe.

[*] well, as long as it wasn't created in shared memory. You also need to avoid unexpected recursive calls and the like, but that's usually a simpler problem than synchronizing multi-threaded code.

like image 181
Steve Jessop Avatar answered Oct 12 '22 11:10

Steve Jessop