Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integer conversions(narrowing, widening), undefined behaviour

It was pretty difficult for me to find information on this subject in manner that I could easily understand, so I'm asking for a review of what I have found.It's all about conversion and conversion only.


In examples I will be referring to:

(signed/unsigned) int bigger;
(signed/unsigned) char smaller;
  1. Truncating integers. (bigger->smaller)

    • first truncate bigger on MSB side to match smaller size.
    • second, convert truncated result to signed/unsigned depending on smaller type.


    If bigger value is too big to fit in smaller type, it results in undefined behaviour (correct me on that). However my rule should be working on all machines (correct me on that, too) and results should be predictable.

  2. Widening integers (smaller->bigger)

    a) signed char -> signed int

    • prepend smaller with MSB (1 or 0) to match bigger size
    • convert to signed

    b) signed char -> unsigned int

    • prepend smaller with MSB (1 or 0) to match bigger size.
    • convert to unsigned

    c) unsigned char -> signed int

    • prepend with 0's to match bigger size
    • convert to signed

    d) unsigned char -> unsigned int

    • prepend with 0's to match bigger size
    • convert to unsigned

Where are undefined/unspecified behaviours that I didn' mention that could pop up?

like image 283
zubergu Avatar asked Oct 09 '13 13:10

zubergu


2 Answers

An integral conversion never produces undefined behaviour (it can produce implementation-defined behaviour).

A conversion to a type that can represent the value being converted is always well-defined: the value simply stays unchanged.

A conversion to an unsigned type is always well-defined: the value is taken modulo UINT_MAX+1 (or whatever maximum value the target type admits).

A conversion to a signed type that cannot represent the value being converted results in either an implementation-defined value, or an implementation-defined signal.

Note that the above rules are defined in terms of integer values and not in terms of sequences of bits.

like image 141
n. 1.8e9-where's-my-share m. Avatar answered Oct 15 '22 23:10

n. 1.8e9-where's-my-share m.


From C standard document (p.50 draft version 201x I believe and not exact quote):

  • No two signed integer shall have the same rank

  • The rank of signed integer shall be greater than the rank of any signed integer with less precision.

  • long long int is greater than long int which is greater than int which is greater than short int which is greater than signed char.

  • signed and unsigned of same precision have same rank (ex: signed int is same rank as unsigned int)

  • The rank of any standard integer type shall be greater than the rank of any extended integer type of same width.

  • The rank of char is equal to unsigned char is equal to signed char.

(I'm leaving out bool because you excluded them from your question)

  • The rank of any extended signed integer relative to another extended signed integer is implementation-defined but still subject to other rules of integer conversion rank.

  • for all integer types T1 T2 and T3, is T1 has greater rank than T2 and T2 has greater rank than T3, than T1 has greater rank than T3.

An object with an integer type (other than int and signed int) whose integer rank is LESS than or EQUAL to the rank of int and unsigned int, a bit field of type _Bool, int, signed int or unsigned int; if an int can represent all values of the original type, the value is converted to an int. Otherwise to an unsigned int. All other types are changed by the integer promotion.

In plain terms:

Any type "smaller" than int or unsigned int get promoted to int when converted to other type of greater rank. This is the compiler's job to ensure that a C code compiled for a given machine (architecture) is ISO-C compliant in that regard. char is implementation defined (being signed or unsigned). All other types (promotion or "demotion") are implementation-defined.

What is implementation-defined? It means that a given compiler will systematically behave the same on a given machine. In other words all "implementation-defined" behavior depends BOTH on the compiler AND the target machine.

To make portable code:

  • always promote values to greater rank standard C types.
  • Never "demote" values to lesser types.
  • Avoid all "implementation-defined" implementation in your code.

Why that implementation-defined madness exists if it ruins programmers' effort??? System programming basically requires these implementation-defined behavior.

So more specifically toward your question:

  • truncation will most likely not be protable. Or will require much more effort in maintenance, bug tracking etc, than simply maintaining the code by using higher rank types.
  • If your implementation runs values bigger than the types involved, your design is wrong (unless you are involved in system programming).
  • As a rule of thumb, going from unsigned to signed preserves values but not the other way around. So when an unsigned value goes toe to toe against a signed one, promote the unsigned to signed instead of the other way around.
  • If using as small integer types as possible is memory-critical in your application, you should probably revisit the whole program's achitecture.
like image 32
Sebastien Avatar answered Oct 15 '22 22:10

Sebastien