Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper way to store narrower data types into a wider data type in the C language?

Tags:

c

types

I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.

Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?

Update - Code Sample

Here's what I'm doing:

// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;
like image 956
Jason Thompson Avatar asked Jul 29 '14 20:07

Jason Thompson


People also ask

What int can store?

The INTEGER data type stores whole numbers that range from -2,147,483,647 to 2,147,483,647 for 9 or 10 digits of precision. The number 2,147,483,648 is a reserved value and cannot be used. The INTEGER value is stored as a signed binary integer and is typically used to store counts, quantities, and so on.

What will happen if you change one data type into another?

Explanation: Type casting refers to changing an variable of one data type into another. The compiler will automatically change one type of data into another if it makes sense. For instance, if you assign an integer value to a floating-point variable, the compiler will convert the int to a float.


2 Answers

I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.

So do some unsigned casting on the right.

like image 120
Mike Dunlavey Avatar answered Sep 30 '22 08:09

Mike Dunlavey


Expressions are generally evaluated independently; their results are not affected by the context in which they appear.

An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.

I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).

So given:

unsigned long foo = 3 * 1024 * 1024 * 1024;

the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.

The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).

The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:

unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;

(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)

Let's look at the more general case:

int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;

You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:

unsigned long foo = (unsigned long)a *
                    (unsigned long)b *
                    (unsigned long)d *
                    (unsigned long)d;

This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).

NOTE: This will not work:

unsigned long foo = (unsigned long)(a * b * c * d);

The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.

like image 36
Keith Thompson Avatar answered Sep 30 '22 10:09

Keith Thompson