Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does GCC's -Wconversion behave differently for char vs. unsigned char?

Tags:

c

gcc

Consider

U8 foo(U8 x, U8 y) {
    return x % y;
}

GCC's -Wconversion behaves differently if U8, the type of x and y, is char or unsigned char:

gcc -Wconversion -c test.c -DU8='unsigned char'

(no warning)

gcc -Wconversion -c test.c -DU8=char
test.c: In function ‘foo’:
test.c:2:14: warning: conversion to ‘char’ from ‘int’ may alter its value [-Wconversion]
     return x % y;
            ~~^~~

But from what I understand in both cases x, y undergo integer promotion (to int or unsigned int) and so in both cases it will be converting from int to whatever the return type is (char or unsigned char).

Why is there a difference?

Bonus question: if you enable ubsan (-fsanitize=undefined) then GCC emits -Wconversion in both cases.

EDIT:

There is no argument that x, y undergo integer promotion and then need to be converted to the result type, so no need to explain that.

The only question here is why does GCC behave differently for different types. The answer will involve some insight on GCC's internals.

like image 297
sinelaw Avatar asked Jun 28 '17 06:06

sinelaw


2 Answers

TLDR

using information only about the types involved, gcc should warn for both cases because of conversion from int (larger type) to char/unsigned char (smaller types)

Using also information about the possible values (range analysis) gcc should warn for none because the result of x % y, even after promotions to int, will always fit back to the same type as x and y.

So it seems that in the first case gcc can assert that the operations will never result in a value change, but for some reason cannot do that for the second case.

As a side note, clang does not warn for any.


Type system

  • On the tested system (x86-64) the char type is signed. Please be aware that it still a different type than signed char.

  • x % y Due to integer promotion rules, in both cases, x and y are promoted to int. The result x % y is of type int.

  • If we make all the implicit conversions explicit then we get this:

    unsigned char foo1(unsigned char x, unsigned char y)
    {
       return (unsigned char)((int) x % (int) y);
    }
    
    char foo2(char x, char y)
    {
       return (char)((int) x % (int) y);
    }
    
  • Implicit conversion from int to char, unsigned char and to signed char fires the warning with -Wconversion:

    -Wconversion

    Warn for implicit conversions that may alter a value. This includes [..] and conversions to smaller types

    Indeed both these functions result in a warning getting generated:

    char bar1(int a)
    {
       return a; // warning: conversion from 'int' to 'char' may change value [-Wconversion]
    }
    
    unsigned char bar2(int a)
    {
       return a;  // warning: conversion from 'int' to 'unsigned char' may change value [-Wconversion]
    }
    

So using type information only we should get a warning for both because our 2 functions have an implicit conversion from int to char/unsigned char just like bar1 and bar2.

Value analysis

If we use the notation r = x % y then r has the same sign as x and |r| ∈ [0, |y|).

  • if x and y are of type unsigned char then r ∈ [0, CHAR_MAX).

    r fits in an unsigned char. So no warning needed.

  • if x and y are of type char:

    • CHAR_MIN = -CHAR_MAX - 1
    • max(|y|) = CHAR_MAX + 1
    • |r| ∈ [0, max(|y|))
    • |r| ∈ [0, CHAR_MAX + 1)
    • r ∈ (-CHAR_MAX - 1, CHAR_MAX + 1)

    r fits in a char so no warning needed.

So what I am arguing is that the result of x % y always fits in an U8 even after all the integer promotions and implicit conversions.


You can have a look at this godbolt

like image 130
bolov Avatar answered Oct 31 '22 17:10

bolov


As you say, x % y involves implicit type conversion of both operands to int (the integer promotion rule/the usual arithmetic conversions). The result of the operation is type int.

-Wconversion is concerned about implicit changes of signedness in an expression, since those might possibly be unintended. It gives warnings when you convert between signed and unsigned type without an explicit cast. It also apparently warns against potential issues with overflow, when implicitly converting from a larger type (signed or unsigned) to a smaller one.

(The char type has implementation-defined signedness, it may be either unsigned or signed. GCC likes to make it signed in all implementations I've seen. )

An implicit conversion from int to char may lead to overflow of the char.

We can silence the compiler by writing return (char)(x % y);. This only hides away the potential bug though. You have to ensure in your code that the overflow can never happen before adding such an explicit cast.

like image 26
Lundin Avatar answered Oct 31 '22 18:10

Lundin