This is a quote from the C11 Standard: <blockquote> 6.5 Expressions ... 6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using <code>memcpy</code> or <code>memmove</code>, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access. 7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types: — a type compatible with the effective type of the object, — a qualified version of a type compatible with the effective type of the object, — a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or — a character type. </blockquote> Does this imply that <code>memcpy</code> cannot be used for type punning this way: <pre class="prettyprint"><code>double d = 1234.5678; uint64_t bits; memcpy(&bits, &d, sizeof bits); printf("the representation of %g is %08"PRIX64"\n", d, bits); </code></pre> Why would it not give the same output as: <pre class="prettyprint"><code>union { double d; uint64_t i; } u; u.d = 1234.5678; printf("the representation of %g is %08"PRIX64"\n", d, u.i); </code></pre> What if I use my version of <code>memcpy</code> using character types: <pre class="prettyprint"><code>void *my_memcpy(void *dst, const void *src, size_t n) { unsigned char *d = dst; const unsigned char *s = src; for (size_t i = 0; i < n; i++) { d[i] = s[i]; } return dst; } </code></pre> <hr> EDIT: EOF commented that The part about <code>memcpy()</code> in paragraph 6 doesn't apply in this situation, since <code>uint64_t bits</code> has a declared type. I agree, but, unfortunately, this does not help answer the question whether <code>memcpy</code> can be used for type punning, it just makes paragraph 6 irrelevant to assess the validity of the above examples. Here here is another attempt at type punning with <code>memcpy</code> that I believe would be covered by paragraph 6: <pre class="prettyprint"><code>double d = 1234.5678; void *p = malloc(sizeof(double)); if (p != NULL) { uint64_t *pbits = memcpy(p, &d, sizeof(double)); uint64_t bits = *pbits; printf("the representation of %g is %08"PRIX64"\n", d, bits); } </code></pre> Assuming <code>sizeof(double) == sizeof(uint64_t)</code>, Does the above code have defined behavior under paragraph 6 and 7? <hr> EDIT: Some answers point to the potential for undefined behavior coming from reading a trap representation. This is not relevant as the C Standard explicitly excludes this possibility: <blockquote> 7.20.1.1 Exact-width integer types 1 The typedef name <code>int</code>N<code>_t</code> designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, <code>int8_t</code> denotes such a signed integer type with a width of exactly 8 bits. 2 The typedef name <code>uint</code>N<code>_t</code> designates an unsigned integer type with width N and no padding bits. Thus, <code>uint24_t</code> denotes such an unsigned integer type with a width of exactly 24 bits. These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names. </blockquote> Type <code>uint64_t</code> has exactly 64 value bits and no padding bits, thus there cannot be any trap representations.

There are two cases to consider: <code>memcpy()</code>ing into an object that has a declared type, and <code>memcpy()</code>ing into an object that does not. In the second case, <pre class="prettyprint"><code>double d = 1234.5678; void *p = malloc(sizeof(double)); assert(p); uint64_t *pbits = memcpy(p, &d, sizeof(double)); uint64_t bits = *pbits; printf("the representation of %g is %08"PRIX64"\n", d, bits); </code></pre> The behavior is indeed undefined, since the effective type of the object pointed to by <code>p</code> will become <code>double</code>, and accessing an object of effective type <code>double</code> though an lvalue of type <code>uint64_t</code> is undefined. On the other hand, <pre class="prettyprint"><code>double d = 1234.5678; uint64_t bits; memcpy(&bits, &d, sizeof bits); printf("the representation of %g is %08"PRIX64"\n", d, bits); </code></pre> is not undefined. C11 draft standard n1570: <blockquote> 7.24.1 String function conventions 3 For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). </blockquote> And <blockquote> 6.5 Expressions 7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88) <blockquote> — a type compatible with the effective type of the object, — a qualified version of a type compatible with the effective type of the object, — a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or — a character type. </blockquote> Footnote 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased. </blockquote> So the <code>memcpy()</code> itself is well-defined. Since <code>uint64_t bits</code> has a declared type, it retains its type even though its object representation was copied from a <code>double</code>. As chqrlie points out, <code>uint64_t</code> cannot have trap representations, so accessing <code>bits</code> after the <code>memcpy()</code> is not undefined, provided <code>sizeof(uint64_t) == sizeof(double)</code>. However, the value of <code>bits</code> will be implementation-dependent (for example due to endianness). Conclusion: <code>memcpy()</code> can be used for type-punning, provided that the destination of the <code>memcpy()</code> does have a declared type, i.e. is not allocated by <code>[m/c/re]alloc()</code> or equivalent.

You propose 3 ways which all have different problems with C standard. <ol> <li> standard library <code>memcpy</code> <pre class="prettyprint"><code>double d = 1234.5678; uint64_t bits; memcpy(&bits, &d, sizeof bits); printf("the representation of %g is %08"PRIX64"\n", d, bits); </code></pre> The <code>memcpy</code> part is legal (provided in your implementation <code>sizeof(double) == sizeof(uint64_t)</code> which is not guaranteed per standard): you access two objects through char pointers. But the <code>printf</code> line is not. The representation in <code>bits</code> is now a double. it might be a trap representation for an <code>uint64_t</code>, as defined in 6.2.6.1 General §5 <blockquote> Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation. </blockquote> And 6.2.6.2 Integer types says explicitely <blockquote> For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits ... The values of any padding bits are unspecified.53 </blockquote> With note 53 saying: <blockquote> Some combinations of padding bits might generate trap representations, </blockquote> If you know that in your implementation there are no padding bits (still never seen one...) every representation is a valid value, and the <code>print</code> line becomes valid again. But it is only implementation dependant and can be undefined behaviour in the general case </li> <li> union <pre class="prettyprint"><code>union { double d; uint64_t i; } u; u.d = 1234.5678; printf("the representation of %g is %08"PRIX64"\n", d, u.i); </code></pre> The members of the union do not share a common subsequence, and you are accessing a member which is not the last value written. Ok common implementation will give expected results but per standard it is not explicitely defined what should happen. A footnote in 6.5.2.3 Structure and union members §3 says that if leads to same problems as previous case: <blockquote> If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation. </blockquote> </li> <li> custom <code>memcpy</code> Your implementation only does character accesses which is always allowed. It is exactly the same thing as the first case: implementation defined. </li> </ol> The only way that would be explicitely defined per standard would be to store the representation of the <code>double</code> in an char array of the correct size, and then display the bytes values of the char array: <pre class="prettyprint"><code>double d = 1234.5678; unsigned char bits[sizeof(d)]; memcpy(&bits, &d, sizeof(bits)); printf("the representation of %g is ", d); for(int i=0; i<sizeof(bits); i++) { printf("%02x", (unsigned int) bits[i]); } printf("\n"); </code></pre> And the result will only be useable if the implementation uses exactly 8 bits for a <code>char</code>. But it would be visible because it would display more than 8 hexa digits if one of the bytes had a value greater than 255. <hr> All of the above is only valid because <code>bits</code> has a declared type. Please see @EOF's answer to understand why it would be different for an allocated object

Can memcpy be used for type punning?

Tags:

c

type-conversion

language-lawyer

This is a quote from the C11 Standard:

6.5 Expressions
...

6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

Does this imply that memcpy cannot be used for type punning this way:

double d = 1234.5678;
uint64_t bits;
memcpy(&bits, &d, sizeof bits);
printf("the representation of %g is %08"PRIX64"\n", d, bits);

Why would it not give the same output as:

union { double d; uint64_t i; } u;
u.d = 1234.5678;
printf("the representation of %g is %08"PRIX64"\n", d, u.i);

What if I use my version of memcpy using character types:

void *my_memcpy(void *dst, const void *src, size_t n) {
    unsigned char *d = dst;
    const unsigned char *s = src;
    for (size_t i = 0; i < n; i++) { d[i] = s[i]; }
    return dst;
}

EDIT: EOF commented that The part about memcpy() in paragraph 6 doesn't apply in this situation, since uint64_t bits has a declared type. I agree, but, unfortunately, this does not help answer the question whether memcpy can be used for type punning, it just makes paragraph 6 irrelevant to assess the validity of the above examples.

Here here is another attempt at type punning with memcpy that I believe would be covered by paragraph 6:

double d = 1234.5678;
void *p = malloc(sizeof(double));
if (p != NULL) {
    uint64_t *pbits = memcpy(p, &d, sizeof(double));
    uint64_t bits = *pbits;
    printf("the representation of %g is %08"PRIX64"\n", d, bits);
}

Assuming sizeof(double) == sizeof(uint64_t), Does the above code have defined behavior under paragraph 6 and 7?

EDIT: Some answers point to the potential for undefined behavior coming from reading a trap representation. This is not relevant as the C Standard explicitly excludes this possibility:

7.20.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

Type uint64_t has exactly 64 value bits and no padding bits, thus there cannot be any trap representations.

595

asked Jul 27 '16 00:07

chqrlie

2 Answers

There are two cases to consider: memcpy()ing into an object that has a declared type, and memcpy()ing into an object that does not.

In the second case,

double d = 1234.5678;
void *p = malloc(sizeof(double));
assert(p);
uint64_t *pbits = memcpy(p, &d, sizeof(double));
uint64_t bits = *pbits;
printf("the representation of %g is %08"PRIX64"\n", d, bits);

The behavior is indeed undefined, since the effective type of the object pointed to by p will become double, and accessing an object of effective type double though an lvalue of type uint64_t is undefined.

On the other hand,

double d = 1234.5678;
uint64_t bits;
memcpy(&bits, &d, sizeof bits);
printf("the representation of %g is %08"PRIX64"\n", d, bits);

is not undefined. C11 draft standard n1570:

7.24.1 String function conventions
3 For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

And

6.5 Expressions
7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88)

— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

Footnote 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

So the memcpy() itself is well-defined.

Since uint64_t bits has a declared type, it retains its type even though its object representation was copied from a double.

As chqrlie points out, uint64_t cannot have trap representations, so accessing bits after the memcpy() is not undefined, provided sizeof(uint64_t) == sizeof(double). However, the value of bits will be implementation-dependent (for example due to endianness).

Conclusion: memcpy() can be used for type-punning, provided that the destination of the memcpy() does have a declared type, i.e. is not allocated by [m/c/re]alloc() or equivalent.

answered Sep 16 '22 15:09

EOF

You propose 3 ways which all have different problems with C standard.

standard library memcpy
```
double d = 1234.5678;
uint64_t bits;
memcpy(&bits, &d, sizeof bits);
printf("the representation of %g is %08"PRIX64"\n", d, bits);
```
The memcpy part is legal (provided in your implementation sizeof(double) == sizeof(uint64_t) which is not guaranteed per standard): you access two objects through char pointers.

But the printf line is not. The representation in bits is now a double. it might be a trap representation for an uint64_t, as defined in 6.2.6.1 General §5

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

And 6.2.6.2 Integer types says explicitely

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits ... The values of any padding bits are unspecified.⁵³

With note 53 saying:

Some combinations of padding bits might generate trap representations,

If you know that in your implementation there are no padding bits (still never seen one...) every representation is a valid value, and the print line becomes valid again. But it is only implementation dependant and can be undefined behaviour in the general case
union
```
union { double d; uint64_t i; } u;
u.d = 1234.5678;
printf("the representation of %g is %08"PRIX64"\n", d, u.i);
```
The members of the union do not share a common subsequence, and you are accessing a member which is not the last value written. Ok common implementation will give expected results but per standard it is not explicitely defined what should happen. A footnote in 6.5.2.3 Structure and union members §3 says that if leads to same problems as previous case:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
custom memcpy

Your implementation only does character accesses which is always allowed. It is exactly the same thing as the first case: implementation defined.

The only way that would be explicitely defined per standard would be to store the representation of the double in an char array of the correct size, and then display the bytes values of the char array:

double d = 1234.5678;
unsigned char bits[sizeof(d)];
memcpy(&bits, &d, sizeof(bits));
printf("the representation of %g is ", d);
for(int i=0; i<sizeof(bits); i++) {
    printf("%02x", (unsigned int) bits[i]);
}
printf("\n");

And the result will only be useable if the implementation uses exactly 8 bits for a char. But it would be visible because it would display more than 8 hexa digits if one of the bytes had a value greater than 255.

All of the above is only valid because bits has a declared type. Please see @EOF's answer to understand why it would be different for an allocated object

answered Sep 17 '22 15:09

Serge Ballesta

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can memcpy be used for type punning?

Tags:

c

type-conversion

language-lawyer

chqrlie

People also ask

2 Answers

EOF

Serge Ballesta

Recent Activity

Donate For Us

Can memcpy be used for type punning?

Tags:

c

type-conversion

language-lawyer

chqrlie

People also ask

2 Answers

EOF

Serge Ballesta

Related questions

Recent Activity

Donate For Us