From http://en.cppreference.com/w/cpp/string/byte/memcpy: <blockquote> If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined. </blockquote> At my work, we have used <code>std::memcpy</code> for a long time to bitwise swap objects that are not TriviallyCopyable using: <pre class="prettyprint"><code>void swapMemory(Entity* ePtr1, Entity* ePtr2) { static const int size = sizeof(Entity); char swapBuffer[size]; memcpy(swapBuffer, ePtr1, size); memcpy(ePtr1, ePtr2, size); memcpy(ePtr2, swapBuffer, size); } </code></pre> and never had any issues. I understand that it is trivial to abuse <code>std::memcpy</code> with non-TriviallyCopyable objects and cause undefined behavior downstream. However, my question: Why would the behavior of <code>std::memcpy</code> itself be undefined when used with non-TriviallyCopyable objects? Why does the standard deem it necessary to specify that? UPDATE The contents of http://en.cppreference.com/w/cpp/string/byte/memcpy have been modified in response to this post and the answers to the post. The current description says: <blockquote> If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined unless the program does not depend on the effects of the destructor of the target object (which is not run by <code>memcpy</code>) and the lifetime of the target object (which is ended, but not started by <code>memcpy</code>) is started by some other means, such as placement-new. </blockquote> PS Comment by @Cubbi: <blockquote> @RSahu if something guarantees UB downstream, it renders the entire program undefined. But I agree that it appears to be possible to skirt around UB in this case and modified cppreference accordingly. </blockquote>

<blockquote> Why would the behavior of <code>std::memcpy</code> itself be undefined when used with non-TriviallyCopyable objects? </blockquote> It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call. Using the target object - calling its member functions, accessing its data members - is clearly undefined[basic.life]/6, and so is a subsequent, implicit destructor call[basic.life]/4 for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5: <blockquote> However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation). </blockquote> If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the <code>memcpy</code> call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make. It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. <code>std::copy</code> on pointers to trivially copyable types usually calls <code>memcpy</code> on the underlying bytes. So does <code>swap</code>. So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.

It is easy enough to construct a class where that <code>memcpy</code>-based <code>swap</code> breaks: <pre class="prettyprint"><code>struct X { int x; int* px; // invariant: always points to x X() : x(), px(&x) {} X(X const& b) : x(b.x), px(&x) {} X& operator=(X const& b) { x = b.x; return *this; } }; </code></pre> <code>memcpy</code>ing such object breaks that invariant. GNU C++11 <code>std::string</code> does exactly that with short strings. This is similar to how the standard file and string streams are implemented. The streams eventually derive from <code>std::basic_ios</code> which contains a pointer to <code>std::basic_streambuf</code>. The streams also contain the specific buffer as a member (or base class sub-object), to which that pointer in <code>std::basic_ios</code> points to.

Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?

Tags:

c++

c++11

language-lawyer

memcpy

object-lifetime

From http://en.cppreference.com/w/cpp/string/byte/memcpy:

If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined.

At my work, we have used std::memcpy for a long time to bitwise swap objects that are not TriviallyCopyable using:

void swapMemory(Entity* ePtr1, Entity* ePtr2) {    static const int size = sizeof(Entity);     char swapBuffer[size];     memcpy(swapBuffer, ePtr1, size);    memcpy(ePtr1, ePtr2, size);    memcpy(ePtr2, swapBuffer, size); }

and never had any issues.

I understand that it is trivial to abuse std::memcpy with non-TriviallyCopyable objects and cause undefined behavior downstream. However, my question:

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects? Why does the standard deem it necessary to specify that?

UPDATE

The contents of http://en.cppreference.com/w/cpp/string/byte/memcpy have been modified in response to this post and the answers to the post. The current description says:

If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined unless the program does not depend on the effects of the destructor of the target object (which is not run by memcpy) and the lifetime of the target object (which is ended, but not started by memcpy) is started by some other means, such as placement-new.

Comment by @Cubbi:

@RSahu if something guarantees UB downstream, it renders the entire program undefined. But I agree that it appears to be possible to skirt around UB in this case and modified cppreference accordingly.

436

asked Apr 21 '15 16:04

R Sahu

2 Answers

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?

It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call.

Using the target object - calling its member functions, accessing its data members - is clearly undefined^{[basic.life]/6}, and so is a subsequent, implicit destructor call^{[basic.life]/4} for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5:

However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the memcpy call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make.

It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. std::copy on pointers to trivially copyable types usually calls memcpy on the underlying bytes. So does swap.
So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.

175

answered Oct 26 '22 03:10

Columbo

It is easy enough to construct a class where that memcpy-based swap breaks:

struct X {     int x;     int* px; // invariant: always points to x     X() : x(), px(&x) {}     X(X const& b) : x(b.x), px(&x) {}     X& operator=(X const& b) { x = b.x; return *this; } };

memcpying such object breaks that invariant.

GNU C++11 std::string does exactly that with short strings.

This is similar to how the standard file and string streams are implemented. The streams eventually derive from std::basic_ios which contains a pointer to std::basic_streambuf. The streams also contain the specific buffer as a member (or base class sub-object), to which that pointer in std::basic_ios points to.

answered Oct 26 '22 02:10

Maxim Egorushkin

Related questions
                            
                                Overriding a default option(...) value in CMake from a parent CMakeLists.txt
                            
                                Argument order to std::min changes compiler output for floating-point
                            
                                What is the value of an undefined constant used in #if?
                            
                                What does stream mean? What are its characteristics?
                            
                                Obtaining list of keys and values from unordered_map
                            
                                What is the lifetime of a C++ lambda expression?
                            
                                C++11: Correct std::array initialization?
                            
                                fixed length data types in C/C++
                            
                                How to speed up g++ compile time (when using a lot of templates)
                            
                                Fast textfile reading in c++
                            
                                Export all symbols when creating a DLL
                            
                                Enable C++11 support on Android
                            
                                Why are NULL pointers defined differently in C and C++?
                            
                                Can we reassign the reference in C++?
                            
                                C++ view types: pass by const& or by value?
                            
                                C++17: Keep only some members when tuple unpacking
                            
                                How do I decide whether to use ATL, MFC, Win32 or CLR for a new C++ project?
                            
                                A lambda's return type can be deduced by the return value, so why can't a function's?
                            
                                Why aren't my include guards preventing recursive inclusion and multiple symbol definitions?
                            
                                do I need to close a std::fstream? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With