Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does std::memcpy make its destination determinate?

Here is the code:

unsigned int a;            // a is indeterminate
unsigned long long b = 1;  // b is initialized to 1
std::memcpy(&a, &b, sizeof(unsigned int));
unsigned int c = a;        // Is this not undefined behavior? (Implementation-defined behavior?)

Is a guaranteed by the standard to be a determinate value where we access it to initialize c? Cppreference says:

void* memcpy( void* dest, const void* src, std::size_t count );

Copies count bytes from the object pointed to by src to the object pointed to by dest. Both objects are reinterpreted as arrays of unsigned char.

But I don't see anywhere in cppreference that says if an indeterminate value is "copied to" like this, it becomes determinate.

From the standard, it seems it's analogous to this:

unsigned int a;            // a is indeterminate
unsigned long long b = 1;  // b is initialized to 1
auto* a_ptr = reinterpret_cast<unsigned char*>(&a);
auto* b_ptr = reinterpret_cast<unsigned char*>(&b);
a_ptr[0] = b_ptr[0];
a_ptr[1] = b_ptr[1];
a_ptr[2] = b_ptr[2];
a_ptr[3] = b_ptr[3];
unsigned int c = a;        // Is this undefined behavior? (Implementation defined behavior?)

It seems like the standard leaves room for this to be allowed, because the type aliasing rules allow for the object a to be accessed as an unsigned char this way. But I can't find something that says this makes a no longer indeterminate.

like image 614
Joel Avatar asked Jul 01 '19 23:07

Joel


2 Answers

Is this not undefined behavior

It's UB, because you're copying into the wrong type. [basic.types]2 and 3 permit byte copying, but only between objects of the same type. You copied from a long long into an int. That has nothing to do with the value being indeterminate. Even though you're only copying sizeof(int) bytes, the fact that you're not copying from an actual int means that you don't get the protection of those rules.

If you were copying into the value of the same type, then [basic.types]3 says that it's equivalent to simply assigning them. That is, a " shall subsequently hold the same value as" b.

like image 187
Nicol Bolas Avatar answered Oct 22 '22 20:10

Nicol Bolas


TL;DR: It's implementation-defined whether there will be undefined behavior or not. Proof-style, with lines of code numbered:


  1. unsigned int a;

The variable a is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with default initialization, in which no other initialization is performed (9.3/7.3).

  1. unsigned long long b = 1ull;

The variable b is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with copy-initialization (9.3/15).

  1. std::memcpy(&a, &b, sizeof(unsigned int));

Per 16.2/2, std::memcpy should have the same semantics and preconditions as the C standard library's memcpy. In the C standard 7.21.2.1, assuming sizeof(unsigned int) == 4, 4 characters are copied from the object pointed to by &b into the object pointed to by &a. (These two points are what is missing from other answers.)

At this point, the sizes of unsigned int, unsigned long long, their representations (e.g. endianness), and the size of a character are all implementation defined (to my understanding, see 6.7.1/4 and its note saying that ISO C 5.2.4.2.1 applies). I will assume that the implementation is little-endian, unsigned int is 32 bits, unsigned long long is 64 bits, and a character is 8 bits.

Now that I have said what the implementation is, I know that a has a value-representation for an unsigned int of 1u. Nothing, so far, has been undefined behavior.

  1. unsigned int c = a;

Now we access a. Then, 6.7/4 says that

For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

I know now that the value of a is determined by the implementation-defined value bits in a, which I know hold the value-representation for 1u. The value of a is then 1u.

Then like (2), the variable c is copy-initialized to 1u.


We made use of implementation-defined values to find what happens. It is possible that the implementation-defined value of 1ull is not one of the implementation-defined set of values for unsigned int. In that case, accessing a will be undefined behavior, because the standard doesn't say what happens when you access a variable with a value-representation that is invalid.

AFAIK, we can take advantage of the fact that most implementations define an unsigned int where any possible bit pattern is a valid value-representation. Therefore, there will be no undefined behavior.

like image 22
Joel Avatar answered Oct 22 '22 21:10

Joel