Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't memcpy guaranteed to be safe for non-POD types?

Tags:

c++

I read about this paragraph from a few questions posted on SO.

I can't quite figure out why memcpy isn't guaranteed to be safe for a non-POD type. My understanding is that memcpy is just a bit-wise copy.

Below is a quote from standard

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.41) If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

# define N sizeof (T)
char buf[N];
T obj ; // obj initialized to its original value
std :: memcpy (buf , & obj , N); // between these two calls to std::memcpy,
                                 // obj might be modified
std :: memcpy (& obj , buf , N); // at this point, each subobject of obj of 
                                 // scalar type holds its original value
like image 582
ROTOGG Avatar asked Jul 23 '13 12:07

ROTOGG


5 Answers

In general the problem is that objects introduce not only data, but also behaviors.

By copying the data manually we may break the inherent behavior of the object, which may rely on the copy constructor.

A great example would be any shared or unique pointer - by copying it we break the "deal" we made with that class when we used it.

Regardless of the copying process being semantically correct or not, the idea behind doing that is wrong and violates the object programming paradigm.

Sample code:

/** a simple object wrapper around a pthread_mutex
 */
class PThreadMutex
{
   public:
    /** locks the mutex. Will block if mutex is already locked */
    void lock();

    /** unlocks the mutex. undefined behavior if mutex is unlocked */
    void unlock();

   private:
    pthread_mutex_t m_mutex;

};

/** a simple implementation of scoped mutex lock. Acquires and locks a Mutex on creation,
 * unlocks on destruction
 */
class ScopedLock
{
  public:
    /** constructor specifying the mutex object pointer to lock
     * Locks immediately or blocks until lock is free and then locks
     * @param mutex the mutex pointer to lock
     */
    ScopedLock ( PThreadMutex* mutex );

    /** default destructor. Unlocks the mutex */
    ~ScopedLock ();

    /** locks the mutex. Will block if mutex is already locked */
    void unlock();


  private:

    PThreadMutex* m_mutex;

    // flag to determine whether the mutex is locked
    bool m_locked;

    // private copy constructor - disable copying
    ScopedLock(ScopedLock &mutex) { (void)mutex; /* to get rid of warning */ };

};

If you copy ScopedLock class, manually unlock it, then restore the value and perform another unlock in constructor it will result in an undefined behavior (or at least EPERM error in the destructor).

like image 124
Dariusz Avatar answered Nov 04 '22 23:11

Dariusz


Imagine a class that holds some pointer to a buffer like this:

class Abc {
    public:
    int* data;
    size_t n;
    Abc(size_t n)
    {
        this->n = n;
        data = new int[n];
    }

    // copy constructor:
    Abc(const Abc& copy_from_me)
    {
        n = copy_from_me.n;
        data = new int[n];
        memcpy(data, copy_from_me.data, n*sizeof(int));
    }
    Abc& operator=(const Abc& copy_from_me)
    {
        n = copy_from_me.n;
        data = new int[n];
        memcpy(data, copy_from_me.data, n*sizeof(int));
        return *this;
    }

    ~Abc()
    {
        delete[] data;
    }
} ;

If you just memcopy one of its constructed instance, you'll get two instances pointing onto the same buffer data, because they will have the same address of buffer in data pointer. If you modify the data in one instance, it will be modified in the other too.

This means you didn't truly cloned it into two independent classes. Moreover, if you then delete both classes, the buffer would be freed twice from the memory which would crash. So the class has to have a copy constructor defined and you have to rather copy it using the constructor.

like image 25
nio Avatar answered Nov 04 '22 23:11

nio


Try bit-wise copying std::shared_ptr<>. You might find that your program blows up in your face more often than not.

You'll encounter this problem with any class whose copy constructor does something other than a bit-wise copy. In the case of std::shared_ptr<>, it'll copy the pointer but won't increment the reference count, so you'll end up freeing the shared object and its reference count early, and then blowing up when the copied shared_ptr tries to decrement the freed reference count.


UPDATE: It was pointed out that this doesn't quite answer the question, which is fair, since I mainly addressed the idea of copying shared_ptr to shared_ptr, not shared_ptr to char[] and back again. However, the principle still holds.

If you bit-wise copy a shared_ptr to a char[], assign a different value to the shared_ptr, then copy the char[] back over, the end result may be to leak one object and double-delete another, i.e., UB.

The same might happen with a POD, but that would be a bug in the program logic. Bit-wise copying back into the POD equivalent of a modified shared_ptr would be perfectly valid as long as the program understands and accommodates such an event. Doing so for a std::shared_ptr generally won't work.

like image 43
Marcelo Cantos Avatar answered Nov 04 '22 23:11

Marcelo Cantos


Suppose for example that you are writing a String class. Any instance of the class should hold a pointer to some dynamically allocated char array. If you memcopy such an instance, then the two pointers will be equal. Any modification of one string will affect the other one.

like image 1
hivert Avatar answered Nov 04 '22 22:11

hivert


C++11 note: The quote in the question is a rather old version of the rule. Since C++11, the requirement is trivially copyable which is much weaker than POD.


memcpy can be used from any object. You get a bitwise image of the object.

If the object is not POD, then the image cannot be used as if it were the same type as the original object, because the lifetime rules require initialization to complete first.

In such cases, the image is merely a bunch of bytes. That might still be useful, for example to detect changes in the internal representation of an object over time, but only operations valid on bytes (such as comparison between two images) are legal, and not operations that require an object of the original type.

like image 1
Ben Voigt Avatar answered Nov 04 '22 21:11

Ben Voigt