Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the advantage of using memset() in C

Tags:

c

embedded

memset

I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.

Given the following buffer declarations...

struct More_Buffer_Info
{
    unsigned char a[10];
    unsigned char b[10];
    unsigned char c[10];
};

struct My_Buffer_Type
{
    struct More_Buffer_Info buffer_info[100];
};

struct My_Buffer_Type my_buffer[5];

unsigned char *p;
p = (unsigned char *)my_buffer;

Besides having less lines of code, is there an advantage to using this:

memset((void *)p, 0, sizeof(my_buffer));

Over this:

for (i = 0; i < sizeof(my_buffer); i++)
{
    *p++ = 0;
}
like image 463
embedded_guy Avatar asked Dec 16 '11 00:12

embedded_guy


People also ask

Why do we use memset in c?

memset() is used to fill a block of memory with a particular value. The syntax of memset() function is as follows : // ptr ==> Starting address of memory to be filled // x ==> Value to be filled // n ==> Number of bytes to be filled starting // from ptr to be filled void *memset(void *ptr, int x, size_t n);

Why memset is faster?

It is because memset()'s implementation is optimized for the size of the block it will operate upon and as Quora User pointed out, on the target architechture. Looking at the gcc disassembly offers some insight. int A[1]; memset(A, 0, sizeof(A));

What does memset return in c?

The memset() function sets the first count bytes of dest to the value c . The value of c is converted to an unsigned character. The memset() function returns a pointer to dest .

What is memset and memcpy in c?

memset() is used to set all the bytes in a block of memory to a particular char value. Memset also only plays well with char as it's its initialization value. memcpy() copies bytes between memory. This type of data being copied is irrelevant, it just makes byte-for-byte copies.


2 Answers

This applies to both memset() and memcpy():

  1. Less Code: As you have already mentioned, it's shorter - fewer lines of code.
  2. More Readable: Shorter usually makes it more readable as well. (memset() is more readable than that loop)
  3. It can be faster: It can sometimes allow more aggressive compiler optimizations. (so it may be faster)
  4. Misalignment: In some cases, when you're dealing with misaligned data on a processor that doesn't support misaligned accesses, memset() and memcpy() may be the only clean solution.

To expand on the 3rd point, memset() can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.

The basic idea here is that memset() and similar library functions, in some sense, "tells" the compiler your intent.


As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:

  1. You need to make sure that memset() actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.
  2. For non-zero data, memset() is restricted to only 1 byte content. So you can't use memset() if you want to set an array of ints to something other than zero (or 0x01010101 or something...).
  3. Although rare, there are some corner cases, where it's actually possible to beat the compiler in performance with your own loop.*

*I'll give one example of this from my experience:

Although memset() and memcpy() are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.

So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.

A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of floats) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.

float *ptrA = ...  //  some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ...  //  some unknown source, guaranteed to be 32-byte aligned
int length = ...   //  some unknown source, guaranteed to be multiple of 8

//  memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
//  aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));

//  This loop could potentially be faster because it "uses" the fact that
//  the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
    _mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}
like image 90
Mysticial Avatar answered Nov 15 '22 19:11

Mysticial


It depends on the quality of the compiler and the libraries. In most cases memset is superior.

The advantage of memset is that in many platforms it is actually a compiler intrinsic; that is, the compiler can "understand" the intention to set a large swath of memory to a certain value, and possibly generate better code.

In particular, that could mean using specific hardware operations for setting large regions of memory, like SSE on the x86, AltiVec on the PowerPC, NEON on the ARM, and so on. This can be an enormous performance improvement.

On the other hand, by using a for loop you are telling the compiler to do something more specific, "load this address into a register. Write a number to it. Add one to the address. Write a number to it," and so on. In theory a perfectly intelligent compiler would recognize this loop for what it is and turn it into a memset anyway; but I have never encountered a real compiler that did this.

So, the assumption is that memset was written by smart people to be the very best and fastest possible way to set a whole region of memory, for the specific platform and hardware the compiler supports. That is often, but not always, true.

like image 44
Crashworks Avatar answered Nov 15 '22 19:11

Crashworks