Is <code>memset()</code> more efficient than <code>for</code> loop. Considering this code: <pre class="prettyprint"><code>char x[500]; memset(x,0,sizeof(x)); </code></pre> And this one: <pre class="prettyprint"><code>char x[500]; for(int i = 0 ; i < 500 ; i ++) x[i] = 0; </code></pre> Which one is more efficient and why? Is there any special instruction in hardware to do block level initialization.

Well, why don't we take a look at the generated assembly code, full optimization under VS 2010. <pre class="prettyprint"><code>char x[500]; char y[500]; int i; memset(x, 0, sizeof(x) ); 003A1014 push 1F4h 003A1019 lea eax,[ebp-1F8h] 003A101F push 0 003A1021 push eax 003A1022 call memset (3A1844h) </code></pre> And your loop... <pre class="prettyprint"><code>char x[500]; char y[500]; int i; for( i = 0; i < 500; ++i ) { x[i] = 0; 00E81014 push 1F4h 00E81019 lea eax,[ebp-1F8h] 00E8101F push 0 00E81021 push eax 00E81022 call memset (0E81844h) /* note that this is *replacing* the loop, not being called once for each iteration. */ } </code></pre> So, under this compiler, the generated code is exactly the same. <code>memset</code> is fast, and the compiler is smart enough to know that you are doing the same thing as calling <code>memset</code> once anyway, so it does it for you. If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that <code>memset</code> will be at least as fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced. That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.

Is memset() more efficient than for loop in C?

Tags:

performance

c

memset

Is memset() more efficient than for loop.

Considering this code:

char x[500]; memset(x,0,sizeof(x));

And this one:

char x[500]; for(int i = 0 ; i < 500 ; i ++) x[i] = 0;

Which one is more efficient and why? Is there any special instruction in hardware to do block level initialization.

589

asked Sep 09 '11 21:09

David

2 Answers

Most certainly, memset will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.

I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlen function. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighth of the time). How? Tricky, but nicely: strlen.

answered Oct 02 '22 08:10

Diego Sevilla

Well, why don't we take a look at the generated assembly code, full optimization under VS 2010.

char x[500]; char y[500]; int i;        memset(x, 0, sizeof(x) );      003A1014  push        1F4h     003A1019  lea         eax,[ebp-1F8h]     003A101F  push        0     003A1021  push        eax     003A1022  call        memset (3A1844h)

And your loop...

char x[500]; char y[500]; int i;      for( i = 0; i < 500; ++i ) {     x[i] = 0;        00E81014  push        1F4h         00E81019  lea         eax,[ebp-1F8h]         00E8101F  push        0         00E81021  push        eax         00E81022  call        memset (0E81844h)          /* note that this is *replacing* the loop,           not being called once for each iteration. */ }

So, under this compiler, the generated code is exactly the same. memset is fast, and the compiler is smart enough to know that you are doing the same thing as calling memset once anyway, so it does it for you.

If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that memset will be at least as fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced.

That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.

answered Oct 02 '22 08:10

Ed S.

Related questions
                            
                                Multiple directories under CMake
                            
                                pthread_cond_wait versus semaphore
                            
                                Combine static libraries on Apple
                            
                                fcntl, lockf, which is better to use for file locking?
                            
                                Create string with specified number of characters
                            
                                Why don't multiple decrement operators work in C when they work in C++?
                            
                                How to wrap a function with variable length arguments?
                            
                                Does using large libraries inherently make slower code?
                            
                                forward declaration of a struct in C?
                            
                                How to compare ends of strings in C?
                            
                                How do I execute a file in Cygwin?
                            
                                Get a timestamp in C in microseconds?
                            
                                Casting one C structure into another
                            
                                Line by line c - c++ code debugging in Linux ubuntu [closed]
                            
                                What is the difference between static and extern in C?
                            
                                Is it true that fork() calls clone() internally?
                            
                                How to render text in SDL2?
                            
                                how to use cURL on specific interface
                            
                                What is the proper use of the comma operator?
                            
                                Macro / keyword which can be used to print out method name?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With