Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency and Multithreading

I'm not very experienced with subjects such as Concurrency and Multithreading. In fact, in most of my web-development career I had never needed to touch these subjects.

I feel like it's an important concept, especially for Desktop applications and basically any other application that doesn't generate HTML :).

After reading a bit on concurrency, it seems to be better supported in languages like Go (google programming language) and I don't quite understand why would a language be better than others at a concept like concurrency, since it's basically about being able to fork() processes and compute stuff in parallel, right? Isn't this how programming works?

Multithreading seems to be a branch of concurrency as it allows you to run things in parallel beneath the same process, although it seems to be platform specific how it's implemented.

I guess my question is, why would specific languages be better at concurrency than others and why would fork()ing processes be a better solution rather than just using threads?

like image 266
Luca Matteis Avatar asked Jan 03 '10 22:01

Luca Matteis


3 Answers

Well for one thing, multiple threads are not the same as multiple processes, so fork() really does not apply here.

Multithreading/parallel processing is hard. First you have to figure out how to actually partition the task to be done. Then you have to coordinate all of the parallel bits, which may need to talk to each other or share resources. Then you need to consolidate the results, which in some cases can be every bit as difficult as the previous two steps. I'm simplifying here, but hopefully you get the idea.

So your question is, why would some languages be better at it? Well, several things can make it easier:

  • Optimized immutable data structures. You want to stick to immutable structures whenever possible in parallel processing, because they are much easier to reason about. Some languages have better support for these, and some have various optimizations, i.e. the ability to splice collections together without any actual copying while still enforcing the immutability. You can always build your own structures like these, but it's easier if the language or framework does it for you.

  • Synchronization primitives and ease of using them. When different threads do share state, they need to be synchronized and there are many different ways to accomplish this. The wider the array of sync primitives you get, the easier your task will ultimately be. Performance will take a hit if you have to sync with a critical section instead of a reader-writer lock.

  • Atomic transactions. Even better than a wide array of sync primitives is not having to use them at all. Database engines are very good at this; instead of you, the programmer, having to figure out exactly which resources you need to lock and when and how, you just say to the compiler or interpreter, "all of the stuff below this line needs to happen together, so make sure nobody else messes around with it while I'm using it." And the engine will figure out the locking for you. You almost never get this kind of simplicity in an abstract programming language, but the closer you can come, the better. Thread-safe objects that combine multiple common operations into one are a start.

  • Automatic parallelism. Let's say you have to iterate through a long list of items and transform them somehow, like multiply 50,000 10x10 matrices. Wouldn't it be nice if you could just tell the compiler: Hey, each operation can be done independently, so use a separate CPU core for each one? Without having to actually implement the threading yourself? Some languages support this kind of thing; for example, the .NET team has been working on PLINQ.

Those are just a few examples of things that can make your life easier in parallel/multi-threaded applications. I'm sure that there are many more.

like image 189
Aaronaught Avatar answered Sep 30 '22 10:09

Aaronaught


In languages that are not designed for concurrency, you must rely upon low-level system calls and manage a lot of things yourself. In contrast, a programming language designed for concurrency, like Erlang, will provide high-level constructs that hide the low-level details. This makes it easier to reason about the correctness of your code, and also results in more portable code.

Also, in a programming language designed for concurrency, there are typically only a handful of ways to do concurrent things, which leads to consistency. In contrast, if the programming language was not designed for concurrency, then different libraries and different programmers will do things in different ways, making it difficult to make choices about how to do them.

It's a bit like the difference between a programming language with automated garbage collection and one without. Without the automation, the programmer has to think a lot about implementation details.

The difference between multithreaded programming and multi-process programming (i.e., fork()), is that a multithreaded program may be more efficient because data doesn't have to be passed across process boundaries, but a multi-process approach may be more robust.

like image 41
Kristopher Johnson Avatar answered Sep 30 '22 09:09

Kristopher Johnson


Regarding your question of why fork() instead of threading: when you use separate processes, you get automatic separation of address spaces. In multithreaded programs, it is very common for threads to communicate using their (naturally) shared memory. This is very efficient, but it is also hard to get all the synchronization between threads right, and this is why some languages are better at multithreading than others: they provide better abstractions to handle the common cases of communication between threads.

With separate processes, you don't have these problems to the same extent. Typically, you set up communication between processes to follow some form of message-passing pattern, which is easier to get right. (Well, you can use shared memory between processes too, but that's not as common as message passing.) On Unix systems fork() has typically been very cheap, so traditional design of concurrent programs in Unix uses processes, and pipes to communicate between them, but on systems where process creation is an expensive operation, threads are often regarded as the better approach.

like image 36
JaakkoK Avatar answered Sep 30 '22 09:09

JaakkoK