I am not sure to that. Can I write a large memset (for example 10 MB), on four cores to gain speedup with this?
Is such ram-chip parallelization possible at all, and also how big are time costs of firing other threads - is it more than a millisecond or less?
You are pointing out a right question, at the same time it is difficult to give a simple answer to it. There are several aspects involved.
Bigger PCs have several memory buses. Smaller ones have only one. On a one memory bus system this does not make any sense. If your system has several memory buses (channels) your array of data may have arbitrary split between memory banks. If it will happen that the whole array sits in the same memory bank, the parralelisation will be useless. Figuring out the layout of your array is an overhead again. In other words before splitting the operation between cores it is necessary to figure out if this is worth doing or not.
Simple answer is that these difficult to predict overheads will most likely will consume the benefit and make the overall result worse.
At the same time for a really huge memory area on some architectures it makes sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With