Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it feasible to implemenent Linux concurrency primitives that give better isolation than threads but comparable performance?

Consider a following application: a web search server that upon start creates a large in-memory index of web pages based on data read from disk. Once initialized, in-memory index can not be modified and multiple threads are started to serve user queries. Assume the server is compiled to native code and uses OS threads.

Now, threading model gives no isolation between threads. A buggy thread or any non thread safe code, can corrupt the index or corrupt memory that was allocated by and logically belongs to some other thread. Such problems are difficult to detect and debug.

Theoretically, Linux allows to enforce a better isolation. Once index is initialized, memory it occupies can be marked read only. Threads can be replaced with processes that share the index (shared memory) but other than that have separate heaps and can not corrupt each other. Illegal operation are automatically detected by hardware and the operating system. No mutexes or other synchronization primitives are needed. Memory related data races are completely eliminated.

Is such model feasible in practice? Are you aware of any real life application that do such things? Or maybe there are some fundamental difficulties that make such model impractical? Do you think such approach would introduce a performance overhead compared to traditional threads? Theoretically, memory that is used is the same, but are there some implementation-related issues that would make things slower?

like image 526
Jan Wrobel Avatar asked Sep 05 '12 20:09

Jan Wrobel


2 Answers

The obvious solution is to not use threads at all. Use separate processes. Since each process has much in common with code and readonly structures, making the readonly data shared is trivial: format it as needed for in-memory use within a file and map the file to memory.

Using this scheme, only the variable per-process data would be independent. The code would be shared and statically initialized data would be shared until written. If a process croaks, there is zero impact on other processes. No concurrency issues at all.

like image 115
wallyk Avatar answered Oct 26 '22 09:10

wallyk


You can use mprotect() to make your index read-only. On a 64-bit system you can map the local memory for each thread at a random address (see this Wikipedia article on address space randomization) which makes the odds of memory corruption from one thread touching another astronomically small (and of course any corruption that misses mapped memory altogether will cause a segfault). Obviously you'll need to have different heaps for each thread.

like image 24
Ben Jackson Avatar answered Oct 26 '22 09:10

Ben Jackson