Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is multi-thread memory access faster than single threaded memory access?

Is multi-thread memory access faster than single threaded memory access?

Assume we are in C language. A simple example is as follows. If I have a gigantic array A and I want to copy A to array B with the same size as A. Is using multithreading to do memory copy faster than it with a single thread? How many threads are suitable to do this kind of memory operation?

EDIT: Let me put the question more narrow. First of all, we do not consider the GPU case. The memory access optimization is very important and effective when we do GPU programming. In my experience, we always need to be careful about the memory operations. On the other hand, it is not always the case when we work on CPU. In addition, let's not consider about the SIMD instructions, such as avx and sse. Those will also show memory performance issues when the program has too many memory access operations as opposed to a lot of computational operations. Assume that we work an x86 architecture with 1-2 CPUs. Each CPU has multiple cores and a quad channel memory interface. The main memory is DDR4, as it is common today.

My array is an array of double precision floating point numbers with the size similar to the size of L3 cache of a CPU, that is roughly 50MB. Now, I have two cases: 1) copy this array to another array with the same size using by doing element-wise copy or by using memcpy. 2) combine a lot of small arrays into this gigantic array. Both are real-time operations, meaning that they need to be done as fast as possible. Does multi-threading give a speedup or a dropdown? What's the factor in this case that affects the performance of memory operations?

Someone said it will mostly depend on DMA performance. I think it is when we do memcpy. What if we do element-wise copy, does the pass through the CPU cache first?

like image 551
user3677630 Avatar asked Feb 07 '17 21:02

user3677630


1 Answers

It depends on many factors. One factor is the hardware you use. On modern PC hardware, multithreading will most likely not lead to performance improvement, because CPU time is not the limiting factor of copy operations. The limiting factor is the memory interface. The CPU will most likely use the DMA controller to do the copying, so the CPU will not be too busy when copying data.

like image 103
Xaver Avatar answered Nov 12 '22 08:11

Xaver