Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What performance can I expect from std::fill_n(ptr, n, 0) relative to memset?

For an iterator ptr which is a pointer, std::fill_n(ptr, n, 0) should do the same thing as memset(ptr, 0, n * sizeof(*ptr)) (but see @KeithThompson's comment on this answer).

For a C++ compiler in C++11/C++14/C++17 mode, under which conditions can I expect these to be compiled to the same code? And when/if they don't compile to the same code, is there a significant performance difference with -O0? -O3?

Note: Of course some/most of the answer might be compiler-specific. I'm only interested in one or two specific compilers, but please write about the compiler(s) for which you know the answer.

like image 661
einpoklum Avatar asked Dec 21 '16 17:12

einpoklum


People also ask

Is memset optimized?

Memset is one of the hottest functions on the operating system and is already quite optimized as a result.

Why memset deprecated?

While researching the upcoming — and significant — C23 version of the C programming language, I learned something surprising: The memset() function will be deprecated. It effectively does nothing when used in the C23 standard. The reason makes a lot of sense. I wrote about the memset() function in a Lesson from 2021.

Is memset faster than memcpy?

Notice that memcpy is only slightly slower then memset . The operations a[j] += b[j] (where j goes over [0,LEN) ) should take three times longer than memcpy because it operates on three times as much data. However it's only about 2.5 as slow as memset .


2 Answers

The answer depends on your implementation of the standard library.

MSVC for example has several implementations of std::fill_n based on the types of what you're trying to fill.

Calling std::fill_n with a char* or signed char* or unsigned char* and it will directly call memset to fill the array.

inline char *_Fill_n(char *_Dest, size_t _Count, char _Val)
{   // copy char _Val _Count times through [_Dest, ...)
_CSTD memset(_Dest, _Val, _Count);
return (_Dest + _Count);
}

If you call with another type, it will fill in a loop:

template<class _OutIt,
class _Diff,
class _Ty> inline
_OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val)
{   // copy _Val _Count times through [_Dest, ...)
for (; 0 < _Count; --_Count, (void)++_Dest)
    *_Dest = _Val;
return (_Dest);
}

The best way to determine the overhead on your particular compiler and standard library implementation would be to profile the code with both calls.

like image 112
lcs Avatar answered Nov 15 '22 17:11

lcs


For all all scenarios where memset is appropriate (i.e. all your objects are PODs) you will most likely find that the two statements are equivalent when any level of optimisation is enabled.

For scenarios where memset is not appropriate, comparison is moot because the use of memset would result in an incorrect program.

You can easily check for yourself using tools such as godbolt (and many others):

for example, on gcc6.2 these two functions generate literally identical code with optimisation level -O3:

#include <algorithm>
#include <cstring>

__attribute__((noinline))
  void test1(int (&x) [100])
{
  std::fill_n(&x[0], 100, 0);
}

__attribute__((noinline))
  void test2(int (&x) [100])
{
  std::memset(&x[0], 0, 100 * sizeof(int));
}

int main()
{
  int x[100];
  test1(x);
  test2(x);
}

https://godbolt.org/g/JIwI5l

like image 39
Richard Hodges Avatar answered Nov 15 '22 19:11

Richard Hodges