Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reallocate array with memcpy and memset

I've taken over some code, and came across a weird reallocation of an array. This is a function from within an Array class (used by the JsonValue)

void reserve( uint32_t newCapacity ) {
    if ( newCapacity > length + additionalCapacity ) {
        newCapacity = std::min( newCapacity, length + std::numeric_limits<decltype( additionalCapacity )>::max() );
        JsonValue *newPtr = new JsonValue[newCapacity];

        if ( length > 0 ) {
            memcpy( newPtr, values, length * sizeof( JsonValue ) );
            memset( values, 0, length * sizeof( JsonValue ) );
        }

        delete[] values;

        values = newPtr;
        additionalCapacity = uint16_t( newCapacity - length );
    }
}

I get the point of this; it is just allocating a new array, and doing a copy of the memory contents from the old array into the new array, then zero-ing out the old array's contents. I also know this was done in order to prevent calling destructors, and moves.

The JsonValue is a class with functions, and some data which is stored in a union (string, array, number, etc.).

My concern is whether this is actually defined behaviour or not. I know it works, and has not had a problem since we began using it a few months ago; but if its undefined then it doesn't mean it is going to keep working.

EDIT: JsonValue looks something like this:

struct JsonValue {
// …
    ~JsonValue() {
        switch ( details.type ) {
        case Type::Array:
        case Type::Object:
            array.destroy();
            break;
        case Type::String:
            delete[] string.buffer;
            break;
        default: break;
        }
    }
private:
    struct Details {
        Key key = Key::Unknown;
        Type type = Type::Null; // (0)
    };

    union {
        Array array;
        String string;
        EmbedString embedString;
        Number number;
        Details details;
    };
};

Where Array is a wrapper around an array of JsonValues, String is a char*, EmbedString is char[14], Number is a union of int, unsigned int, and double, Details contains the type of value it holds. All values have 16-bits of unused data at the beginning, which is used for Details. Example:

struct EmbedString {
    uint16_t : 16;
           char buffer[14] = { 0 };
};
like image 929
ChrisMM Avatar asked Dec 03 '19 10:12

ChrisMM


People also ask

What is the difference between memcpy and memset?

memcpy() copies from one place to another. memset() just sets all pieces of memory to the same. For example here sets string length of the string str to * (or whatever second argument of the memset).

What can I use instead of memcpy?

memmove() is similar to memcpy() as it also copies data from a source to destination.

Is memcpy inefficient?

memcpy is usually naive - certainly not the slowest way to copy memory around, but usually quite easy to beat with some loop unrolling, and you can go even further with assembler.

Do I need to free memory after memset?

Re: Do I need to release memory set by memset? memset does not allocate memory. It only fills a region of memory with a certain value. You do not have to free the buffer (unless the buffer was allocated).


1 Answers

Whether this code has well-defined behavior basically depends on two things: 1) is JsonValue trivially-copyable and, 2) if so, are a bunch of all-zero Bytes a valid object representation for a JsonValue.

If JsonValue is trivially-copyable, then the memcpy from one array of JsonValues to another will indeed be equivalent to copying all the elements over [basic.types]/3. If all-zeroes is a valid object representation for a JsonValue, then the memset should be ok (I believe this actually falls into a bit of a grey-area with the current wording of the standard, but I believe at least the intention would be that this is fine).

I'm not sure why you'd need to "prevent calling destructors and moves", but overwriting objects with zeroes does not prevent destructors from running. delete[] values will call the destructurs of the array members. And moving the elements of an array of trivially-copyable type should compile down to just copying over the bytes anyways.

Furthermore, I would suggest to get rid of these String and EmbedString classes and simply use std::string. At least, it would seem to me that the sole purpose of EmbedString is to manually perform small string optimization. Any std::string implementation worth its salt is already going to do exactly that under the hood. Note that std::string is not guaranteed (and will often not be) trivially-copyable. Thus, you cannot simply replace String and EmbedString with std::string while keeping the rest of this current implementation.

If you can use C++17, I would suggest to simply use std::variant instead of or at least inside this custom JsonValue implementation as that seems to be exactly what it's trying to do. If you need some common information stored in front of whatever the variant value may be, just have a suitable member holding that information in front of the member that holds the variant value rather than relying on every member of the union starting with the same couple of members (which would only be well-defined if all union members are standard-layout types that keep this information in their common initial sequence [class.mem]/23).

The sole purpose of Array would seem to be to serve as a vector that zeroes memory before deallocating it for security reasons. If this is the case, I would suggest to just use an std::vector with an allocator that zeros memory before deallocating instead. For example:

template <typename T>
struct ZeroingAllocator
{
    using value_type = T;

    T* allocate(std::size_t N)
    {
        return reinterpret_cast<T*>(new unsigned char[N * sizeof(T)]);
    }

    void deallocate(T* buffer, std::size_t N) noexcept
    {
        auto ptr = reinterpret_cast<volatile unsigned char*>(buffer);
        std::fill(ptr, ptr + N, 0);
        delete[] reinterpret_cast<unsigned char*>(buffer);
    }
};

template <typename A, typename B>
bool operator ==(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return true; }

template <typename A, typename B>
bool operator !=(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return false; }

and then

using Array = std::vector<JsonValue, ZeroingAllocator<JsonValue>>;

Note: I fill the memory via volatile unsigned char* to prevent the compiler from optimizing away the zeroing. If you need to support overaligned types, you can replace the new[] and delete[] with direct calls to ::operator new and ::operator delete (doing this will prevent the compiler from optimizing away allocations). Pre C++17, you will have to allocate a sufficiently large buffer and then manually align the pointer, e.g., using std::align

like image 132
Michael Kenzel Avatar answered Nov 10 '22 14:11

Michael Kenzel