Is memset()
more efficient than for
loop.
Considering this code:
char x[500]; memset(x,0,sizeof(x));
And this one:
char x[500]; for(int i = 0 ; i < 500 ; i ++) x[i] = 0;
Which one is more efficient and why? Is there any special instruction in hardware to do block level initialization.
Conclusions. List comprehensions are often not only more readable but also faster than using “for loops.” They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.
All zeroing operations that the pool allocator performs and many structure/array initializations that InitAll performs end up going through the memset function. Memset is one of the hottest functions on the operating system and is already quite optimized as a result.
A simple loop is slightly faster for about 10-20 bytes and less (It's a single compare+branch, see OP_T_THRES ), but for larger sizes, memcpy is faster and portable.
memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.
Most certainly, memset
will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.
I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlen
function. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighth of the time). How? Tricky, but nicely: strlen.
Well, why don't we take a look at the generated assembly code, full optimization under VS 2010.
char x[500]; char y[500]; int i; memset(x, 0, sizeof(x) ); 003A1014 push 1F4h 003A1019 lea eax,[ebp-1F8h] 003A101F push 0 003A1021 push eax 003A1022 call memset (3A1844h)
And your loop...
char x[500]; char y[500]; int i; for( i = 0; i < 500; ++i ) { x[i] = 0; 00E81014 push 1F4h 00E81019 lea eax,[ebp-1F8h] 00E8101F push 0 00E81021 push eax 00E81022 call memset (0E81844h) /* note that this is *replacing* the loop, not being called once for each iteration. */ }
So, under this compiler, the generated code is exactly the same. memset
is fast, and the compiler is smart enough to know that you are doing the same thing as calling memset
once anyway, so it does it for you.
If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that memset
will be at least as fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced.
That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With