For an iterator ptr
which is a pointer, std::fill_n(ptr, n, 0)
should do the same thing as memset(ptr, 0, n * sizeof(*ptr))
(but see @KeithThompson's comment on this answer).
For a C++ compiler in C++11/C++14/C++17 mode, under which conditions can I expect these to be compiled to the same code? And when/if they don't compile to the same code, is there a significant performance difference with -O0? -O3?
Note: Of course some/most of the answer might be compiler-specific. I'm only interested in one or two specific compilers, but please write about the compiler(s) for which you know the answer.
Memset is one of the hottest functions on the operating system and is already quite optimized as a result.
While researching the upcoming — and significant — C23 version of the C programming language, I learned something surprising: The memset() function will be deprecated. It effectively does nothing when used in the C23 standard. The reason makes a lot of sense. I wrote about the memset() function in a Lesson from 2021.
Notice that memcpy is only slightly slower then memset . The operations a[j] += b[j] (where j goes over [0,LEN) ) should take three times longer than memcpy because it operates on three times as much data. However it's only about 2.5 as slow as memset .
The answer depends on your implementation of the standard library.
MSVC for example has several implementations of std::fill_n
based on the types of what you're trying to fill.
Calling std::fill_n
with a char*
or signed char*
or unsigned char*
and it will directly call memset
to fill the array.
inline char *_Fill_n(char *_Dest, size_t _Count, char _Val)
{ // copy char _Val _Count times through [_Dest, ...)
_CSTD memset(_Dest, _Val, _Count);
return (_Dest + _Count);
}
If you call with another type, it will fill in a loop:
template<class _OutIt,
class _Diff,
class _Ty> inline
_OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val)
{ // copy _Val _Count times through [_Dest, ...)
for (; 0 < _Count; --_Count, (void)++_Dest)
*_Dest = _Val;
return (_Dest);
}
The best way to determine the overhead on your particular compiler and standard library implementation would be to profile the code with both calls.
For all all scenarios where memset
is appropriate (i.e. all your objects are PODs) you will most likely find that the two statements are equivalent when any level of optimisation is enabled.
For scenarios where memset
is not appropriate, comparison is moot because the use of memset
would result in an incorrect program.
You can easily check for yourself using tools such as godbolt (and many others):
for example, on gcc6.2 these two functions generate literally identical code with optimisation level -O3:
#include <algorithm>
#include <cstring>
__attribute__((noinline))
void test1(int (&x) [100])
{
std::fill_n(&x[0], 100, 0);
}
__attribute__((noinline))
void test2(int (&x) [100])
{
std::memset(&x[0], 0, 100 * sizeof(int));
}
int main()
{
int x[100];
test1(x);
test2(x);
}
https://godbolt.org/g/JIwI5l
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With