I want to run in parallel (not concurrently)1 four threads doing completely independent things. I'm new to parallelism and I have a couple of questions. The reason why I want to do this is because performance is really important for me. I'm working on a 4-core Windows machine and I'm using C++ in Visual Studio Community 2015.
Should I try to schedule the threads myself, so that each one runs on a different core or should I leave the OS Scheduler to do that? In my opinion I think it would be faster if I force it to run each thread on a different core. How can I do that?
This is what I have tried so far:
#include <thread>
void t1(){//do something}
void t2(){//do something}
void t3(){//do something}
void t4(){//do something}
int main(){
std::thread thread1(t1);
std::thread thread2(t2);
std::thread thread3(t3);
std::thread thread4(t4);
t1.join();
t2.join();
t3.join();
t4.join();
}
I know that join()
blocks the thread until it finishes but I'm not sure if it runs the threads in parallel? Is my code executing the threads concurrently or in parallel?
1Note:
Concurrency is essentially when two tasks are being performed at the same time. This might mean that one is 'paused' for a short duration, while the other is being worked on.
Parallelism requires that at least two processes/tasks are actively being performed at a particular moment in time.
A single CPU core can have up-to 2 threads per core. For example, if a CPU is dual core (i.e., 2 cores) it will have 4 threads.
In short: yes, a thread can run on different cores.
Each CPU-core can run only one thread at any given moment. So for example in a quad-core machine, the maximum number of threads that can run in parallel is 4.
Each core can only run 1 thread at a time, i.e. hyperthreading is disabled. So, you can have a total maximum of 20 threads executing in parallel, one thread per CPU/core.
There is no standard way to set affinity of given thread, under the hood std::thread is implemented using posix threads on linux/unixes and with windows threads under Windows. The solution is to use native apis, for example under windows following code will cause full utilization of all the 8 cores of my i7 CPU:
auto fn = []() {while (true);};
std::vector<std::thread> at;
const int num_of_cores = 8;
for (int n = 0; n < num_of_cores; n++) {
at.push_back(std::thread(fn));
// for POSIX: use pthread_setaffinity_np
BOOL res = SetThreadAffinityMask(at.back().native_handle(), 1u << n);
assert(res);
}
for (auto& t : at) t.join();
but after commenting out SetThreadAffinityMask
I still get the same results,all the cores are fully utilized, so Windows scheduler does a good job.
If you want to have a better control of the system cores look into libraries like OpenMP, TBB (Thread Building Blocks), PPL. In this order.
You're done, no need to schedule anything. As long as there are multiple processors available, your threads will run simultaneously on available cores.
If there are less than 4 processors available, say 2, your threads will run in an interleaved manner, with up to 2 running at any given time.
p.s. it's also easy to experience it for yourself - just make 4 infinite loops and run them in 4 different threads. You will see 4 CPUs being used.
DISCLAIMER: Of course, "under the hood", scheduling is being done for you by the OS. So you depend on the quality of the scheduler built into the OS for concurrency. The fairness of the scheduler built into the OS on which a C++ application runs is outside the C++ standard, and so is not guaranteed. In reality though, especially when learning to write concurrent applications, most modern OSes will provide adequate fairness in the scheduling of threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With