Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why prefer signed over unsigned in C++? [closed]

People also ask

Should I use signed or unsigned?

The most important reason to use unsigned instead of signed is not self-documentation, it is to avoid undefined behaviour. A final argument you could make against using unsigned integers is that you're eventually going need to interact with the C standard library.

Why we use signed and unsigned?

The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.

What is the use of signed and unsigned in C?

The int type in C is a signed integer, which means it can represent both negative and positive numbers. This is in contrast to an unsigned integer (which can be used by declaring a variable unsigned int), which can only represent positive numbers.

What happens when you cast signed to unsigned in C?

Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3. 1.3): When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.


As per requests in comments: I prefer int instead of unsigned because...

  1. it's shorter (I'm serious!)

  2. it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)

  3. what if I want to signal an error by returning an out-of-range value?

Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.


Let me paraphrase the video, as the experts said it succinctly.

Andrei Alexandrescu:

  • No simple guideline.
  • In systems programming, we need integers of different sizes and signedness.
  • Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.

Chandler Carruth:

  • Here's some simple guidelines:
    1. Use signed integers unless you need two's complement arithmetic or a bit pattern
    2. Use the smallest integer that will suffice.
    3. Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
  • Stop worrying and use tools to tell you when you need a different type or size.

Bjarne Stroustrup:

  • Use int until you have a reason not to.
  • Use unsigned only for bit patterns.
  • Never mix signed and unsigned

Wariness about signedness rules aside, my one-sentence take away from the experts:

Use the appropriate type, and when you don't know, use an int until you do know.


Several reasons:

  1. Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).

  2. unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.


There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.

In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.

Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.

Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.

The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.

But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.

In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.


The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).

Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.

The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.

If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.

Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.


Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.

Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.

As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.

It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.


To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.

There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:

  1. Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
  2. Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
  3. Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.

(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).