Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding or assigning an integer literal to a size_t

Tags:

c

size-t

c89

In C I see a lot of code that adds or assigns an integer literal to a size_t variable.

size_t foo = 1;
foo += 1;

What conversion takes place here, and can it ever happen that a size_t is "upgraded" to an int and then converted back to a size_t? Would that still wraparound if I was at the max?

size_t foo = SIZE_MAX;
foo += 1;

Is that defined behavior? It's an unsigned type size_t which is having a signed int added to it (that may be a larger type?) and the converted back to a size_t. Is there risk of signed integer overflow?

Would it make sense to write something like foo + bar + (size_t)1 instead of foo + bar + 1? I never see code like that, but I'm wondering if it's necessary if integer promotions are troublesome.

C89 doesn't say how a size_t will be ranked or what exactly it is:

The value of the result is implementation-defined, and its type (an unsigned integral type) is size_t defined in the header.

like image 839
newguy Avatar asked Oct 22 '16 02:10

newguy


People also ask

Can you assign int to Size_t?

If size_t is narrower than int (that is, int can represent all values of size_t ), then foo is promoted to int in the expression foo + 1 . The only way this could overflow is if INT_MAX == SIZE_MAX . Theoretically that is possible, e.g. 16-bit int and 15-bit size_t .

Should I use int or Size_t?

When writing C code you should always use size_t whenever dealing with memory ranges. The int type on the other hand is basically defined as the size of the (signed) integer value that the host machine can use to most efficiently perform integer arithmetic.

What is an integer literal in C++?

Integer literals are numbers that do not have a decimal point or an exponential part. They can be represented as: Decimal integer literals.

What is the Size_t in C++?

size_t type is a base unsigned integer type of C/C++ language. It is the type of the result returned by sizeof operator. The type's size is chosen so that it can store the maximum size of a theoretically possible array of any type. On a 32-bit system size_t will take 32 bits, on a 64-bit one 64 bits.


2 Answers

The current C standard allows for a possibility of an implementation that would cause undefined behavior when executing the following code, however such implementation does not exist, and probably never will:

size_t foo = SIZE_MAX;
foo += 1;

The type size_t is as unsigned type1, with a minimum range:2 [0,65535].

The type size_t may be defined as a synonym for the type unsigned short. The type unsigned short may be defined having 16 precision bits, with the range: [0,65535]. In that case the value of SIZE_MAX is 65535.

The type int may be defined having 16 precision bits (plus one sign bit), two's complement representation, and range: [-65536,65535].

The expression foo += 1, is equivalent to foo = foo + 1 (except that foo is evaluated only once but that is irrelevant here). The variable foo will get promoted using integer promotions3. It will get promoted to type int because type int can represent all values of type size_t and rank of size_t, being a synonym for unsigned short, is lower than the rank of int. Since the maximum values of size_t, and int are the same, the computation causes a signed overflow, causing undefined behavior.

This holds for the current standard, and it should also hold for C89 since it doesn't have any stricter restrictions on types.

Solution for avoiding signed overflow for any imaginable implementation is to use an unsigned int integer constant:

foo += 1u;

In that case if foo has a lower rank than int, it will be promoted to unsigned int using usual arithmetic conversions.


1 (Quoted from ISO/IEC 9899/201x 7.19 Common definitions 2)
size_t
which is the unsigned integer type of the result of the sizeof operator;

2 (Quoted from ISO/IEC 9899/201x 7.20.3 Limits of other integer types 2)
limit of size_t
SIZE_MAX 65535

3 (Quoted from ISO/IEC 9899/201x 6.3.1.1 Boolean, characters, and integers 2)
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

like image 180
2501 Avatar answered Sep 26 '22 02:09

2501


It depends, since size_t is an implementation-defined unsigned integral type.

Operations involving a size_t will therefore introduce promotions, but these depend on what size_t actually is, and what other types involved in the expression actually are.

If size_t was equivalent to a unsigned short (e.g. a 16-bit type) then

size_t foo = 1;
foo += 1;

would (semantically) promote foo to a int, add 1, and then convert the result back to size_t for storing in foo. (I say "semantically", because that is the meaning of the code according to the standard. A compiler is free to apply the "as if" rule - i.e. do anything it likes, as long as it delivers the same net effect).

On another hand, if size_t was equivalent to a long long unsigned (e.g. a 64-bit signed type), then the same code would promote 1 to be of type long long unsigned, add that to the value of foo, and store the result back into foo.

In both cases, the net result is the same unless an overflow occurs. In this case, there is no overflow, since an both int and size_t are guaranteed able to represent the values 1 and 2.

If an overflow occurs (e.g. adding a larger integral value), then the behaviour can vary. Overflow of a signed integral type (e.g. int) results in undefined behaviour. Overflow of an unsigned integral type uses modulo arithmetic.

As to the code

size_t foo = SIZE_MAX;
foo += 1;

it is possible to do the same sort of analysis.

If size_t is equivalent to a unsigned short then foo would be converted to int. If int is equivalent to a signed short, it cannot represent the value of SIZE_MAX, so the conversion will overflow, and the result is undefined behaviour. If int is able to represent a larger range than short int (e.g. it is equivalent to long), then the conversion of foo to int will succeed, incrementing that value will succeed, and storing back to size_t will use modulo arithmetic and produce the result of 0.

If size_t is equivalent to unsigned long, then the value 1 will be converted to unsigned long, adding that to foo will use modulo arithmetic (i.e. produce a result of zero), and that will be stored into foo.

It is possible to do similar analyses assuming that size_t is actually other unsigned integral types.

Note: In modern systems, a size_t that is the same size or smaller than an int is unusual. However, such systems have existed (e.g. Microsoft and Borland C compilers targeting 16-bit MS-DOS on hardware with an 80286 CPU). There are also 16-bit microprocessors still in production, mostly for use in embedded systems with lower power usage and low throughput requirements, and C compilers that target them (e.g. Keil C166 compiler which targets the Infeon XE166 microprocessor family). [Note: I've never had reason to use the Keil compiler but, given its target platform, it would not be a surprise if it supports a 16-bit size_t that is the same size or smaller than the native int type on that platform].

like image 42
Peter Avatar answered Sep 27 '22 02:09

Peter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!