Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why don't c++ compilers replace this access to a const class member with its value known at compile time?

In this code snippet, why don't c++ compilers just return 1 when compiling test(), but read the value from memory?

struct Test {
  const int x = 1;

  // Do not allow initializing x with a different value
  Test() {}
};
int test(const Test& t) {
  return t.x; 
}

Code on golbolt

Compiler output:

test(Test const&):                         # @test(Test const&)
    mov     eax, dword ptr [rdi]
    ret

I would have expected:

test():                               # @test(Test const&)
    mov     eax, 1
    ret

Is there any standard-compliant way to modify the value of Test::x to contain a different value than 1? Or would the compilers be allowed to do this optimization, but neither gcc nor clang have implemented it?

EDIT: Of course you immediately found my mistake in making this a minimum example, that is allowing aggregate initialization for the struct. I updated the code with an empty default constructor that prevents that. (Old code on godbolt)

like image 917
Tobias Avatar asked Dec 03 '22 09:12

Tobias


2 Answers

I believe it's because you can still construct an instance of Test with other values of x with an initializer list like this:

Test x{2};
cout << test(x);

Demo: https://www.ideone.com/7vlCmX

like image 102
Al.G. Avatar answered Dec 23 '22 01:12

Al.G.


Now you've disallowed using a constructor to create an instance of a Test object with a different x value, but gcc/clang still aren't optimizing.

It may be legal to use char* or memcpy to create an object-representation of a Test object with a different x value, without violating the strict-aliasing rule. That would make the optimization illegal.

Update, see discussion in comments; in the ISO standard 6.8.4 basic.type.qualifier "A const object is an object of type const T" and doesn't rule out it being a sub-object, and getting at it via a pointer to the struct probably just counts as a non-const access path to a const object. (Any attempt to modify a const object during its lifetime results in undefined behavior doesn't leave room for loopholes since this is an object, not a reference to an object). So the char* and memcpy methods look to be UB, and even placement-new probably can't help: Placement new and assignment of class with const member - reuse is allowed only if "the type of the original object is not const-qualified".

(That language about not reusing the storage of a const object changed in C++20; it now leaves the door open for using placement-new on a whole struct/class object that's non-const, even if it contains const members.)

Manufacturing a brand new Test object with arbitrary x value via std::bit_cast<Test>( int ) still appears to be fully legal even in ISO C++. It is Trivially Copyable. Also, it appears that real implementations such as GCC and clang define the behaviour for all these cases, at least de-facto; I didn't check their official docs to see if it's mentioned as a GNU extension. As far as optimizer limits, that's what matters.


This section hinges on some flimsy arguments / wishful thinking

   Test foo;
   *(char*)&foo = 3;  // change first byte of the object-representation
                      // which is where foo.x's value lives

In reference contexts in C++, const means you can't modify this object through this reference. I don't know how that applies for a const member of a non-const object.

This is a Standard Layout type, so it should be binary compatible with an equivalent C struct, and also safe to write/read to a file and back without UB. It's a POD type (or I think the C++20 replacement for the concept of POD). It's even trivially copyable with or without a Test(const Test&) = default; copy-constructor, although that's probably not relevant.

If it's legal to write it out to a file and read it back, it should still be well-defined even if the file is modified in between. Or if we memcpy it to an array, modify the array, and copy back:

   Test foo;
   char buf[sizeof(foo)];
   memcpy(buf, &foo, sizeof(foo));
   buf[0] = 3;         // on a little-endian system like x86, this is buf.x = 3;  - the upper bytes stay 0
   memcpy(&foo, buf, sizeof(foo));

The only questionable step is the final memcpy back into foo; this is what creates a Test object with an x value the constructor couldn't produce.

@Klauss raised a concern about overwriting the whole object without destructing it and doing a placement-new of the new one. I thought that was allowed for Standard Layout POD types, but I haven't checked the standard. That should be allowed for a struct or class whose members are all non-const; that's the point of Standard Layout and POD / TrivialType. In any case, the char* version avoids doing that, not rewriting the whole object.

Does merely having a const member break the ability to write/read the object representation to a file? I don't think so; having a const member doesn't disqualify a type from being Standard Layout, Trivial, and even Trivially Copyable. (This point is the biggest stretch; but I still think it's legal unless someone can show me in the standard where it isn't legal to poke around in the object-representation of a non-const class object.)

It would be extremely weird if having or not-having a constructor that allowed different initializers for the const int x member was the difference between it being UB or not to write/read this object to a file and modify it. The inability to create a Test object with a different x value the "normal" way is a red herring as far as whether it's legal to poke around in the bytes of the object representation. (Although that is still a valid question for a class with a const member.)

And now we're back to non-hand-wavy stuff I think is still fully correct

@Tobias also commented with an example (https://godbolt.org/z/3abaEqWdM) that uses C++20 std::bit_cast to manufacture a Test object with x == 2 that's constexpr-safe and evaluates correctly even inside a static_assert. std::bit_cast


We can also see from this example that GCC and clang leave room for non-inline function calls to modify that member of an already-constructed Test object:

void ext(void*);  // might do anything to the pointed-to memory

int test() {
    Test foo;    // construct with x=1
    ext (&foo);
    return foo.x;   // with ext() commented out,  mov eax, 1
}

Godbolt

# GCC11.2 -O3.  clang is basically equivalent.
test():
        sub     rsp, 24             # stack alignment + wasted 16 bytes
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1      # construct with x=1
        call    ext(void*)
        mov     eax, DWORD PTR [rsp+12]    # reload from memory, not mov eax, 1
        add     rsp, 24
        ret

It may or may not be a missed optimization. Many missed-optimizations are things compilers don't look for because it would be computationally expensive (even an ahead-of-time compiler can't use exponential-time algorithms carelessly on potentially-large functions).

This doesn't seem too expensive to look for, though, just checking if a constructor default has no way to be overridden. Although it seems lowish in value in terms of making faster / smaller code since hopefully most code won't do this.

It's certainly a sub-optimal way to write code, because you're wasting space in each instance of the class holding this constant. So hopefully it doesn't appear often in real code-bases. static constexpr is idiomatic and much better than a const per-instance member object if you intentionally have a per-class constant.

However, constant-propagation can be very valuable, so even if it only happens rarely, it can open up major optimizations in the cases it does.

like image 25
Peter Cordes Avatar answered Dec 23 '22 03:12

Peter Cordes