Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will multithreading improve performance significantly if I have a fixed amount of calculations that are independet from each other?

I am programming a raycasting game engine. Each ray can be calculated without knowing anything about the other rays (I'm only calculating distances). Since there is no waiting time between calculations, I wonder whether it's worth the effort to make the ray calculations multithreaded or not. Is it likely that there will be a performance boost?

like image 766
Marius Anderie Avatar asked Aug 18 '14 18:08

Marius Anderie


People also ask

Does multi-threading reduce the performance of a task?

For a simple task of iterating 100 elements multi-threading the task will not provide a performance benefit. Iterating over 100 billion elements and do processing on each element, then the use of additional CPU's may well help reduce processing time.

Why is multithreading better than a single-threaded solution?

Ways Multithreading Provides Better Performance than a Single-Threaded Solution The first example proving the efficiency and performance level of multi-thread over single-thread is that of a typical cash register. In this particular instance, the scanning process involving a convenience store’s items requires a subsequent update of the display.

What are the disadvantages of multi-threading in operating system?

In essence, utilizing the multi-thread option often involves constant time complexity where the operating system will be forced to run a separate thread greatly reducing the level of performance. Multi-threading slows the operations down as the time taken to complete them remains constant.

What is the bane of multithreading performance?

The bane of multithreading performance is dependencies between threads (or tasks). For instance, if a task has two parts, A and B, where B requires information generated by A, that means the two things cannot be done simultaneously.


2 Answers

Mostly likely multi-threading will improve performance if done correctly. The way you've stated your problem, it is a perfect candidate for multi-threading since the computations are independent, reducing the need for coordination between threads to a minimum.

Some reasons you still might not get a speed up, or may not get the full speed up you expect could include:

1) The bottleneck may not be on-die CPU execution resources (e.g., ALU-bound operations), but rather something shared like memory or shared LLC bandwidth.

For example, on some architectures, a single thread may be able to saturate memory bandwidth, so adding more cores may not help. A more common case is that a single core can saturate some fraction, 1/N < 1 of main memory bandwidth, and this value is larger than 1/C where C is the core count. For instance, on a 4 core box, one core may be able to consume 50% of the bandwidth. Then, for a memory-bound computation, you'll get good scaling to 2 cores (using 100% of bandwidth), but little to none above that.

Other resources which are shared among cores include disk and network IO, GPU, snoop bandwidth, etc. If you have a hyper-threaded platform, this list increases to include all levels of cache and ALU resources for logical cores sharing the same physical core.

2) Contention "in practice" between operations which are "theoretically" independent.

You mention that your operations are independent. Typically this means that they are logically independent - they don't share any data (other than perhaps immutable input) and they can write to separate output areas. That doesn't exclude the possibility, however, than any given implementation actually has some hidden sharing going on.

One classic example is false-sharing - where independent variables fall in the same cache line, so logically independent writes to different variables from different threads end up thrashing the cache line between cores.

Another example, frequently encountered in practice, is contention via library - if your routines use malloc heavily, you may find that all the threads spend most of their time waiting on a lock inside the allocator as malloc is shared resource. This can be remedied by reducing reliance on malloc (perhaps via fewer, larger mallocs) or with a good concurrent malloc such as hoard or tcmalloc.

3) Implementation of the distribution and collection of the computation across threads may overwhelm the advantage you get from multiple threads. For example, if you spin up a new thread for every individual ray, the thread creation overhead would dominate your runtime and you would likely see a negative benefit. Even if you use a thread-pool of persistent threads, choosing a "work unit" that is too fine grained will impose a lot of coordination overhead which may eliminate your benefits.

Similarly, if you have to copy the input data to and from the worker threads, you may not see the scaling you expect. Where possible, use pass-by-reference for read-only data.

4) You don't have more than 1 core, or you do have more than 1 core but they are already occupied running other threads or processes. In these cases, the effort to coordinate multiple threads is pure overhead.

like image 110
BeeOnRope Avatar answered Oct 21 '22 22:10

BeeOnRope


In general, it depends. Given that the calculations are independent, it sounds like this is a good candidate for potential performance improvements due to threading. Ray calculations typically can benefit from this.

However, there are many other factors, such as memory access requirements, as well as the underlying system on which this runs, which will have a tremendous impact on this. It's often possible to have multithreaded versions run slower than single threaded versions if not written correctly, so profiling is the only way to answer this definitively.

like image 27
Reed Copsey Avatar answered Oct 21 '22 23:10

Reed Copsey