Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is memset() more efficient than for loop in C?

Is memset() more efficient than for loop.

Considering this code:

char x[500]; memset(x,0,sizeof(x)); 

And this one:

char x[500]; for(int i = 0 ; i < 500 ; i ++) x[i] = 0; 

Which one is more efficient and why? Is there any special instruction in hardware to do block level initialization.

like image 589
David Avatar asked Sep 09 '11 21:09

David


People also ask

What is more efficient than a for loop?

Conclusions. List comprehensions are often not only more readable but also faster than using “for loops.” They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.

Is memset optimized?

All zeroing operations that the pool allocator performs and many structure/array initializations that InitAll performs end up going through the memset function. Memset is one of the hottest functions on the operating system and is already quite optimized as a result.

Is memcpy faster than for loop C?

A simple loop is slightly faster for about 10-20 bytes and less (It's a single compare+branch, see OP_T_THRES ), but for larger sizes, memcpy is faster and portable.

Is memset faster than fill?

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.


2 Answers

Most certainly, memset will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.

I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlen function. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighth of the time). How? Tricky, but nicely: strlen.

like image 59
Diego Sevilla Avatar answered Oct 02 '22 08:10

Diego Sevilla


Well, why don't we take a look at the generated assembly code, full optimization under VS 2010.

char x[500]; char y[500]; int i;        memset(x, 0, sizeof(x) );      003A1014  push        1F4h     003A1019  lea         eax,[ebp-1F8h]     003A101F  push        0     003A1021  push        eax     003A1022  call        memset (3A1844h)   

And your loop...

char x[500]; char y[500]; int i;      for( i = 0; i < 500; ++i ) {     x[i] = 0;        00E81014  push        1F4h         00E81019  lea         eax,[ebp-1F8h]         00E8101F  push        0         00E81021  push        eax         00E81022  call        memset (0E81844h)          /* note that this is *replacing* the loop,           not being called once for each iteration. */ } 

So, under this compiler, the generated code is exactly the same. memset is fast, and the compiler is smart enough to know that you are doing the same thing as calling memset once anyway, so it does it for you.

If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that memset will be at least as fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced.

That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.

like image 41
Ed S. Avatar answered Oct 02 '22 08:10

Ed S.