Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

Disclaimer: This is trying to drill down on a larger problem, so please don't get hung up with whether the example makes any sense in practice.

And, yes, if you want to copy objects, please use / provide the copy-constructor. (But note how even the example does not copy a whole object; it tries to blit some memory over a few adjacent(Q.2) integers.)


Given a C++ Standard Layout struct, can I use memcpy to write to multiple (adjacent) sub-objects at once?

Complete example: ( https://ideone.com/1lP2Gd https://ideone.com/YXspBk)

#include <vector>
#include <iostream>
#include <assert.h>
#include <inttypes.h>
#include <stddef.h>
#include <memory.h>

struct MyStandardLayout {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;

    MyStandardLayout()
    : mem_a('a')
    , num_1(1 + (1 << 14))
    , num_2(1 + (1 << 30))
    , num_3(1LL + (1LL << 62))
    , mem_z('z')
    { }

    void print() const {
        std::cout << 
            "MySL Obj: " <<
            mem_a << " / " <<
            num_1 << " / " <<
            num_2 << " / " <<
            num_3 << " / " <<
            mem_z << "\n";
    }
};

void ZeroInts(MyStandardLayout* pObj) {
    const size_t first = offsetof(MyStandardLayout, num_1);
    const size_t third = offsetof(MyStandardLayout, num_3);
    std::cout << "ofs(1st) =  " << first << "\n";
    std::cout << "ofs(3rd) =  " << third << "\n";
    assert(third > first);
    const size_t delta = third - first;
    std::cout << "delta =  " << delta << "\n";
    const size_t sizeAll = delta + sizeof(MyStandardLayout::num_3);
    std::cout << "sizeAll =  " << sizeAll << "\n";

    std::vector<char> buf( sizeAll, 0 );
    memcpy(&pObj->num_1, &buf[0], sizeAll);
}

int main()
{
    MyStandardLayout obj;
    obj.print();
    ZeroInts(&obj);
    obj.print();

    return 0;
}

Given the wording in the C++ Standard:

9.2 Class Members

...

13 Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. (...) Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; (...)

I would conclude that it is guaranteed that num_1 to num_3 have increasing addresses and are adjacent modulo padding.

For the above example to be fully defined, I see these requirements, of which I am not sure they hold:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal. (Given that num_1 is not part of an array.) (Is memcpy(&a + 1, &b + 1, 0) defined in C11? seems a good related question, but doesn't quite fit.)
    • The C++ (14) Standard, AFAICT, refers description of memcpy to the C99 Standard, and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    So for me the question here wrt. this is whether the target range we have here can be considered "an object" according to the C or C++ Standard. Note: A (part of an) array of chars, declared and defined as such, certainly can be assumed to count as "an object" for the purposes of memcpy because I'm pretty sure I'm allowed to copy from one part of a char array to another part of (another) char array.

    So then the question would be if it is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array.

  • Calculating sizeAll is legal, that is usage of offsetof is legal as shown.

  • Writing to the padding in between the members is legal.

Do these properties hold? Have I missed anything else?

like image 650
Martin Ba Avatar asked Aug 18 '16 20:08

Martin Ba


2 Answers

§8.5

(6.2) — if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;

Now the standard does not actually say that these zero-bits will be writeable, but I can't think of an architecture that has this level of granularity on memory access permissions (nor would we want one to).

So I would say that in practice this re-writing zeros will always be safe, even if not specifically declared so by the Powers that Be.

like image 200
Richard Hodges Avatar answered Sep 21 '22 14:09

Richard Hodges


is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array

No, arbitrary subsets of members of objects are not themselves an object of any kind. If you can't take the sizeof something, it's not a thing. Similarly, as suggested by the link you provided, if you can't identify the thing to std::is_standard_layout, it's not a thing.

Analogous would be

size_t n = (char*)&num_3 - (char*)&num_1;

It would compile, but it's UB: subtracted pointers must belong to the same object.

That said, I think you're in safe territory even if the standard isn't explicit. If MyStandardLayout is a standard layout, it stands to reason that a subset of it also is, even if it has no name and is not an identifiable type of its own.

But I wouldn't do it. Assignment is absolutely safe, and potentially faster than memcpy. If the subset is meaningful and has many members, I would consider making it an explicit struct, and using assignment instead of memcpy, taking advantage of the default member-wise copy constructor supplied by the compiler.

like image 32
James K. Lowden Avatar answered Sep 19 '22 14:09

James K. Lowden