I've word a bit with parallel processing in college and now I'm trying to get better at it. I can write code that can run in parallel and then start up threads, but after that I loose control over what the threads do. I would like to know how I can control the threads to things like for example bind a specific thread to a specific processor core.
I am mostly interested in c++ but I've done some coding of this in Java so those answers are also welcome.
In short: yes, a thread can run on different cores.
Thread allocation is managed by the operating system. Threads are created using OS system calls and, if the process happens to run on a multi-core processor, the OS automatically tries to allocate / schedule different threads on different cores. Thread allocation is managed by the programming language implementation.
In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution.
Contrary to the advice of some of the other respondents, for some systems (certainly high frequency trading and no doubt many other very low-latency systems such as search engines), binding a thread to a CPU core (or for hyper-threaded cores, a single CPU thread) can have enormous performance benefits.
The naive but increasingly rejected view is that increasing threads (within reason) increases throughput for such systems. However, the evidence is increasing that when designed properly, solutions which use very few threads for the majority of processing are likely to outperform high-concurrency solutions considerably - sometimes by factors of ten, or even one hundred.
The principal reason for this is context switching. Context switching is the process in which one CPU flushes its working environment for the current thread to cache-RAM (if you're lucky) or main RAM (if you're not), and reads in the working environment for the next thread - and it is one of the most expensive operations which a low-latency system can perform.
If you wish to minimise context switching where low latency is paramount, certain critical processes may well be best restricted to a single core or CPU thread. Where it is necessary for multiple threads to read or write data which is managed by those critical thread-restricted processes, you might wish to look at the "Disruptor" pattern, which uses a ring buffer plus a number of clever tricks to allow very fast access to shared data whilst hardly ever requiring an exclusive lock on that data (linked below).
To achieve thread affinity (CPU binding) tasks in an OS-independent manner in Java, you can use Peter Lawrey's Java Thread Affinity library, also linked below. Also note the example in which Peter binds a reader thread to one hyper-thread of a hyper-threaded core, and a writer thread to the other one, a trick which I could envisage having appreciable benefits (though I have not tried it).
Barney
http://lmax-exchange.github.io/disruptor/
https://github.com/peter-lawrey/Java-Thread-Affinity/wiki/How-it-works
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With