Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient conversion data one integer type to another with the same representation

The majority of microcomputer C compilers have two signed integer types with the same size and representation, along with two such unsigned types. If int is 16 bits, its representation will generally match short; if long is 64 bits, it will generally match long long; otherwise, int and long will usually have matching 32-bit representations.

If, on a platform where long, long long, and int64_t have the same representation, one needs to pass a buffer to three API functions in order (assume the APIs under the control of someone else and use the indicated types; if the functions could readily be changed, they could simply be changed to use the same type throughout).

void fill_array(long *dat, int size);
void munge_array(int64_t *dat, int size);
void output_array(long long *dat, int size);

is there any efficient standard-compliant way of allowing all three functions to use the same buffer without requiring that all of the data be copied between function calls? I doubt the authors of C's aliasing rules intended that such a thing should be difficult, but it is fashionable for "modern" compilers to assume that nothing written via long* will be read via long long*, even when those types have the same representation. Further, while int64_t will generally be the same as either long or long long, implementations are inconsistent as to which.

On compilers that don't aggressively pursue type-based aliasing through function calls, one could simply cast pointers to the proper types, perhaps including a static assertion to ensure that all types have the same size. The problem is that if a compiler like gcc, after expanding out function calls, sees that some storage is written as long and later read as long, without any intervening writes of type long, it may replace the later read with the value written as type long, even if there were intervening writes of type long long.

Disabling type-based aliasing altogether is of course one approach to making such code work. Any decent compiler should allow that, and it will avoid many other possible pitfalls. Still, it seems like there should be a Standard- defined way to perform such a task efficiently. Is there?

like image 285
supercat Avatar asked Sep 08 '16 16:09

supercat


1 Answers

is there any efficient standard-compliant way of allowing all three functions to use the same buffer without requiring that all of the data be copied between function calls? I doubt the authors of C's aliasing rules intended that such a thing should be difficult, but it is fashionable for "modern" compilers to assume that nothing written via long* will be read via long long*, even when those types have the same representation.

C specifies that long and long long are different types, even if they have the same representation. Regardless of representation, they are not "compatible types" in the sense defined by the standard. Therefore, the strict aliasing rule (C2011 6.5/7) applies: an object having effective type long shall not have its stored value accessed by an lvalue of type long long, and vise versa. Therefore, whatever is the effective type of your buffer, your program exhibits undefined behavior if it accesses elements both as type long and as type long long.

Whereas I concur that the authors of the standard did not intend that what you describe should be hard, they also have no particular intention to make it easy. They are concerned above all with defining program behavior in a way that as much as possible is invariant with respect to all of the freedoms allowed to implementations, and among those freedoms is that long long can have a different representation than does long. Therefore, no program that relies on them having the same representation can be strictly conforming, regardless of the nature or context of that reliance.

Still, it seems like there should be a Standard- defined way to perform such a task efficiently. Is there?

No. The effective type of the buffer is its declared type if it has one, or otherwise is defined by the manner in which its stored value was set. In the latter case, that might change if a different value is written, but any given value has only one effective type. Whatever its effective type is, the strict aliasing rule does not allow for the value to be accessed via lvalues both of type long and of type long long. Period.

Disabling type-based aliasing altogether is of course one approach to making such code work. Any decent compiler should allow that, and it will avoid many other possible pitfalls.

Indeed, that or some other implementation-specific approach, possibly including It Just Works, are your only alternatives for sharing the same data among the three functions you present without copying.

Update:

Under some restricted circumstances there may be a somewhat more standard-based solution. For example, with the specific API functions you designated, you could do something like this:

union buffer {
    long       l[BUFFER_SIZE];
    long long ll[BUFFER_SIZE];
    int64_t  i64[BUFFER_SIZE]; 
} my_buffer;

fill_array(my_buffer.l, BUFFER_SIZE);
munge_array(my_buffer.i64, BUFFER_SIZE);
output_array(my_buffer.ll, BUFFER_SIZE);

(Props to @Riley for giving me this idea, though it differs a bit from his.)

Of course that doesn't work if your API dynamically allocates the buffer itself. Note, too, that

  • A program using that approach may conform to the standard, but if it assumes the same representation for long, long long, and int64_t then it still does not strictly conform, as the standard defines that term.

  • The standard is a bit inconsistent on this point. Its remarks about allowing type punning via a union are in a footnote, and the footnotes are non-normative. The reinterpretation described in that footnote seems to contradict paragraph 6.5/7, which is normative. I prefer to keep my mission-critical code far away from uncertainties such as this, for even if we conclude that this approach should work, the uncertainty provides just the kind of cranny that compiler bugs like to lodge in.

  • A rather well-known figure in the field once had this to say about the issue:

Unions are not useful [for aliasing], regardless of what silly language lawyers say, since they are not a generic method. Unions only work for trivial and largely uninteresting cases, and it doesn't matter what C99 says about the issue, since that nasty thing called "real life" interferes.

like image 170
John Bollinger Avatar answered Sep 18 '22 19:09

John Bollinger