I am designing a dedicated syslog-processing daemon for Linux that needs to be robust and scalable and I'm debating multithread vs. multiprocess.
The obvious objection with multithreading is complexity and nasty bugs. Multi-processes may impact performance because of IPC communications and context switching.
"The Art of Unix Programming" discusses this here.
Would you recommend a process-based system (like Apache) or a multi-threaded approach?
Multiprocessing is used to create a more reliable system, whereas multithreading is used to create threads that run parallel to each other. multithreading is quick to create and requires few resources, whereas multiprocessing requires a significant amount of time and specific resources to create.
Threads are faster to start than processes and also faster in task-switching. All Threads share a process memory pool that is very beneficial. Takes lesser time to create a new thread in the existing process than a new process.
A multiprocessing system has more than two processors, whereas Multithreading is a program execution technique that allows a single process to have multiple code segments. Multiprocessing improves the system's reliability, while in the multithreading process, each thread runs parallel to each other.
Executing multi-tasking is comparatively slower. Executing multi-threading is comparatively much faster. The termination of a process takes up comparatively more time in multi-tasking. The termination of a process takes up comparatively less time in multithreading.
Both of them can be complicated and complex in their own ways.
You can do either. In the grand scheme of things, it might not matter which you choose. What does matter is how well you do them. Therefore:
Do what you are most experienced with. Or if your leading a team, do what the team is most experienced with.
---Threading!---
I have done a lot of threaded programming, and I enjoy parts of it, and parts of it I do not enjoy. I've learned a lot, and now can usually write a multi-threaded application without too much pain, but it does have to be written in a very specific way. Namely:
1) It has to be written with very clearly defined data boundaries that are 100% thread safe. Otherwise, whatever condition that can happen, will happen, and it might not be when you have a debugger laying around.. Plus debugging threaded code is like peering into Schrodinger's box... By looking in there, other threads may or may not have had time to process more.
2) It has to be written with test code that stresses the machine. Many multi-threaded systems only show their bugs when the machines are heavily stressed.
3) There has to be some very smart person who owns the data exchanging code. If there is any way for a shortcut to be made, some developer will probably make it, and you will have an errant bug.
4) There has to be catch-all situations that will reset the application with a minimum of fuss. This is for the production code that breaks because of some threading issue. In short: The show must go on.
---Cross-Process!---
I have less experience with process-based threading, but have recently been doing some cross-process stuff in Windows (where the IPC is web service calls... WOO!), and it is relatively clean and simple, but I follow some rules here as well. By and large, interprocess communication will be much more error free because programs receive input from the outside world very well.. and those transport mechanisms are usually asynchronous. Anyway...
1) Define clear process boundaries and communication mechanisms. Message/eventing via, oh say, TCP or web services or pipes or whatever is fine, as long as the borders are clear, and there is a lot of validation and error checking code at those borders.
2) Be prepared for bottlenecks. Code forgiveness is very important. By this I mean, sometimes you won't be able to write to that pipe. You have to be able to requeue and retry those messages without the application locking up/tossing an exception.
3) There will be a lot more code in general, because transporting data across process boundaries means you have to serialize it in some fashion. This can be a source of problems, especially when you start maintaining and changing that code.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With