Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a standard, strided version of memcpy?

Tags:

c

memcpy

stride

I have a column vector A which is 10 elements long. I have a matrix B which is 10 by 10. The memory storage for B is column major. I would like to overwrite the first row in B with the column vector A.

Clearly, I can do:

for ( int i=0; i < 10; i++ )
{
    B[0 + 10 * i] = A[i];
}

where I've left the zero in 0 + 10 * i to highlight that B uses column-major storage (zero is the row-index).

After some shenanigans in CUDA-land tonight, I had a thought that there might be a CPU function to perform a strided memcpy?? I guess at a low-level, performance would depend on the existence of a strided load/store instruction, which I don't recall there being in x86 assembly?

like image 326
M. Tibbits Avatar asked May 16 '11 06:05

M. Tibbits


People also ask

What can I use instead of memcpy?

memmove() is similar to memcpy() as it also copies data from a source to destination.

What library is memcpy in C++?

The memcpy() function in C++ copies specified bytes of data from the source to the destination. It is defined in the cstring header file.

How can I make memcpy faster?

memcpy is only faster if: BOTH buffers, src AND dst, are 4-byte aligned. if so, memcpy() can copy a 32bit word at a time (inside its own loop over the length) if just one buffer is NOT 32bit word aligned - it creates overhead to figure out and it will do at the end a single char copy loop.

Does memcpy clear memory?

memcpy() itself doesn't do any memory allocations. You delete what you new , and delete[] what you new[] . You do neither new nor new[] . Both source and destination arrays are allocated on the stack and will be automatically deallocated when then go out of scope.


1 Answers

Short answer: The code you have written is as fast as it's going to get.

Long answer: The memcpy function is written using some complicated intrinsics or assembly because it operates on memory operands that have arbitrary size and alignment. If you are overwriting a column of a matrix, then your operands will have natural alignment, and you won't need to resort to the same tricks to get decent speed.

like image 177
Dietrich Epp Avatar answered Sep 22 '22 10:09

Dietrich Epp