Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C safely taking absolute value of integer

Tags:

Consider following program (C99):

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

int main(void)
{
    printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
    intmax_t i;
    if (scanf("%jd", &i) == 1)
        printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}

Now as I understand it, this contains easily triggerable undefined behaviour, like this:

Enter int in range -9223372036854775808 .. 9223372036854775807:
 > -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808

Questions:

  1. Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?

  2. How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?

(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)

like image 507
hyde Avatar asked Feb 07 '16 08:02

hyde


People also ask

What is absolute value of an integer in C?

The abs() function only returns the positive numbers. For example: Suppose we have an integer number -5, and we want to get the absolute number, we use the abs() function to return the positive number as 5.

How do you write the absolute value of an integer?

The symbol to represent absolute value is |x|, where x is an integer.

How do you do absolute difference in C?

C library function - abs() The C library function int abs(int x) returns the absolute value of int x.


2 Answers

If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.

7.8.2.1 The imaxabs function

  1. The imaxabs function computes the absolute value of an integer j. If the result cannot be represented, the behavior is undefined. 221)

221) The absolute value of the most negative number cannot be represented in two’s complement.

The check that makes no assumptions and is always defined is:

intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
    //handle error
}

(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )

like image 72
2501 Avatar answered Sep 28 '22 10:09

2501


How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?

One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.

So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)

uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
  j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);

So how does this work?

uintmax_t j = i;

This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)

if (j > (uintmax_t)INTMAX_MAX) {

If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.

  j = -j;

The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.

So algebraically the result for negative i looks like

j = - (i + 2n) + 2n = -i


Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.

Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.

When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.

In 6.2.5 I see

For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

In 6.2.6.2 I see

#1

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)

#2

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.

So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.


Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.

I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.

I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.

It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.

uintmax_t j = i;
if (i < 0) {
  j = -j;
  if (j == 0) {
    printf("your platform sucks\n");
    exit(1);
  }
}
printf("Result: |%jd| = %ju\n", i, j);

@plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago

Umm,

assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX

uintmax_t j = i;

after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1

if (i < 0) {

i is less than zero so we run the commands inside the if

j = -j;

after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX

which is the correct answer, so no need to trap it in an error case.

like image 38
plugwash Avatar answered Sep 28 '22 11:09

plugwash