I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.
Given the following buffer declarations...
struct More_Buffer_Info
{
unsigned char a[10];
unsigned char b[10];
unsigned char c[10];
};
struct My_Buffer_Type
{
struct More_Buffer_Info buffer_info[100];
};
struct My_Buffer_Type my_buffer[5];
unsigned char *p;
p = (unsigned char *)my_buffer;
Besides having less lines of code, is there an advantage to using this:
memset((void *)p, 0, sizeof(my_buffer));
Over this:
for (i = 0; i < sizeof(my_buffer); i++)
{
*p++ = 0;
}
memset() is used to fill a block of memory with a particular value. The syntax of memset() function is as follows : // ptr ==> Starting address of memory to be filled // x ==> Value to be filled // n ==> Number of bytes to be filled starting // from ptr to be filled void *memset(void *ptr, int x, size_t n);
It is because memset()'s implementation is optimized for the size of the block it will operate upon and as Quora User pointed out, on the target architechture. Looking at the gcc disassembly offers some insight. int A[1]; memset(A, 0, sizeof(A));
The memset() function sets the first count bytes of dest to the value c . The value of c is converted to an unsigned character. The memset() function returns a pointer to dest .
memset() is used to set all the bytes in a block of memory to a particular char value. Memset also only plays well with char as it's its initialization value. memcpy() copies bytes between memory. This type of data being copied is irrelevant, it just makes byte-for-byte copies.
This applies to both memset()
and memcpy()
:
memset()
is more readable than that loop)memset()
and memcpy()
may be the only clean solution.To expand on the 3rd point, memset()
can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.
The basic idea here is that memset()
and similar library functions, in some sense, "tells" the compiler your intent.
As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:
memset()
actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.memset()
is restricted to only 1 byte content. So you can't use memset()
if you want to set an array of int
s to something other than zero (or 0x01010101
or something...).*I'll give one example of this from my experience:
Although memset()
and memcpy()
are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.
So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.
A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of float
s) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.
float *ptrA = ... // some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ... // some unknown source, guaranteed to be 32-byte aligned
int length = ... // some unknown source, guaranteed to be multiple of 8
// memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
// aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));
// This loop could potentially be faster because it "uses" the fact that
// the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
_mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}
It depends on the quality of the compiler and the libraries. In most cases memset is superior.
The advantage of memset is that in many platforms it is actually a compiler intrinsic; that is, the compiler can "understand" the intention to set a large swath of memory to a certain value, and possibly generate better code.
In particular, that could mean using specific hardware operations for setting large regions of memory, like SSE on the x86, AltiVec on the PowerPC, NEON on the ARM, and so on. This can be an enormous performance improvement.
On the other hand, by using a for loop you are telling the compiler to do something more specific, "load this address into a register. Write a number to it. Add one to the address. Write a number to it," and so on. In theory a perfectly intelligent compiler would recognize this loop for what it is and turn it into a memset anyway; but I have never encountered a real compiler that did this.
So, the assumption is that memset was written by smart people to be the very best and fastest possible way to set a whole region of memory, for the specific platform and hardware the compiler supports. That is often, but not always, true.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With