I wonder whether more than 8 threads can run concurrently on a hardware with 8 cores.
If so, using openMP to parallelize N calculations, I could create chunks of size, say, N/8, and in each thread further fork into (N/8)/8 threads, and maybe still more?
How do things happen when I nested parallelize? do I still have 8 available threads for the nested parallel?
Thanks!!
8 cores can only run at most 8 threads concurrently at a given point in time. However, a lot depends on what your threads are doing. If they are doing CPU intensive tasks, it is not recommended to spawn many more threads than the number of cores (a few maybe OK). Otherwise excessive context switching and cache misses will start to degrade performance. However, if there is significant I/O, the threads may be blocked a lot, not using the CPU, so you can run many more of them in parallel.
Bottom line is, you need to measure the performance in your particular case, on your particular environment.
See also this related thread.
Modern cpu processors have the option of hyper-threading.
It means that the pipeline can run two or more threads at the same time.
So the number of threads that can run simultaneously is:
total_threads = num_procs * hyperthreading factor
Generally, the hyperthreading factor = 2.
For a cpu intensive workload, you must run total_threads. For an io intensive workload, you should use total_threads * 2 threads. This way we can overlap the computation of some threads with io of other threads.
These thumb-rules are what I follow. You may change it depending upon the workload.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With