Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the internal implementation of memcpy work?

How does the standard C function 'memcpy' work? It has to copy a (large) chunk of RAM to another area in the RAM. Since I know you cannot move straight from RAM to RAM in assembly (with the mov instruction) so I am guessing it uses a CPU register as the intermediate memory when copying?

But how does it copy? By blocks (how would it copy by blocks?), by individual bytes (char) or the largest data type they have (copy in long long double's - which is 12 bytes on my system).

EDIT: Ok apparently you can move data from RAM to RAM directly, I am not an assembly expert and all I have learnt about assembly is from this document (X86 assembly guide) which mentions in the section about the mov instruction that you cannot move from RAM to RAM. Apparently this isn't true.

like image 648
PersonWithName Avatar asked Jul 06 '13 01:07

PersonWithName


People also ask

How is memcpy implemented in C?

How to implement your own memcpy implementation in C? Implementation of memcpy is not a big deal, you need to typecast the given source and destination address to char* (1 byte). After the typecasting copy the data from the source to destination one by one until n (given length).

How memcpy function works?

memcpy() function in C/C++ The function memcpy() is used to copy a memory block from one location to another. One is source and another is destination pointed by the pointer. This is declared in “string. h” header file in C language.

Does memcpy overwrite or append?

memcpy replaces memory, it does not append.

How is Memmove implemented?

The memmove() function copies n bytes from memory area src to memory area dest. The memory areas may overlap: copying takes place as though the bytes in src are first copied into a temporary array that does not overlap src or dest, and the bytes are then copied from the temporary array to dest.


1 Answers

Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.

Here's a fun article that describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.

like image 95
Gian Avatar answered Oct 05 '22 08:10

Gian