Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About the use of signed integers in C family of languages

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.

When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.

When reading other peoples' code, I rarely see unsigned integers, even if the represented value can't be negative.

So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?

I've search on the subject, here and in other places, and I have to say I can't find a good reason not to use unsigned integers, when it applies.

I came across those questions: «Default int type: Signed or Unsigned?», and «Should you always use 'int' for numbers in C, even if they are non-negative?» which both present the following example:

for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}

To me, this is just bad design. Of course, it may result in an infinite loop, with unsigned integers.
But is it so hard to check if foo.Length() is 0, before the loop?

So I personally don't think this is a good reason for using signed integers all the way.

Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.

Ok, that's good to have a specific value that means «error».
But then, what's wrong with something like UINT_MAX, for that specific value?

I'm actually asking this question because it may lead to some huge problems, usually when using third-party libraries.

In such a case, you often have to deal with signed and unsigned values.

Most of the time, people just don't care about the signedness, and just assign a, for instance, an unsigned int to a signed int, without checking the range.

I have to say I'm a bit paranoid with the compiler warning flags, so with my setup, such an implicit cast will result in a compiler error.

For that kind of stuff, I usually use a function or macro to check the range, and then assign using an explicit cast, raising an error if needed.

This just seems logical to me.

As a last example, as I'm also an Objective-C developer (note that this question is not related to Objective-C only):

- ( NSInteger )tableView: ( UITableView * )tableView numberOfRowsInSection: ( NSInteger )section;

For those not fluent with Objective-C, NSInteger is a signed integer.
This method actually retrieves the number of rows in a table view, for a specific section.

The result will never be a negative value (as the section number, by the way).

So why use a signed integer for this?
I really don't understand.

This is just an example, but I just always see that kind of stuff, with C, C++ or Objective-C.

So again, I'm just wondering if people just don't care about that kind of problems, or if there is finally a good and valid reason not to use unsigned integers for such cases.

Looking forward to hear your answers : )

like image 751
Macmade Avatar asked Dec 08 '11 07:12

Macmade


People also ask

What is the use of signed in C?

The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.

What are signed integers in C?

The int type in C is a signed integer, which means it can represent both negative and positive numbers. This is in contrast to an unsigned integer (which can be used by declaring a variable unsigned int), which can only represent positive numbers.

What are signed integers in programming?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.

How are signed integers stored in C?

Signed integers are stored in two's complement representation. To represent -1 : start with 1 ( 0x00000001 ), perform bit inversion ( 0xfffffffe ), add 1 ( 0xffffffff ). The most significant bit is always 1 for negative numbers and always 0 for positive numbers.

What is a signed integer example?

Obviously they are signed integers like +34, -15, -23,and +17. These numbers along with their sign have to be represented in a computer using only binary notation orbits. The simplest way of representing a signed number is the sign magnitude(SM) method.

Why is it called signed integer?

Well that depends on whether the integer is signed or unsigned. A signed integer means the number can be negative, zero, or positive and an unsigned integer means the number can only be zero or positive. If your integer can have a negative sign, it's signed!


2 Answers

  • a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.

  • if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)

  • languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...

like image 182
akira Avatar answered Oct 17 '22 03:10

akira


When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.

When I'm sure the value will never need to be negative, I then use an unsigned integer. And I have to say this happen most of the time.

To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.

The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:

  • The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.

  • When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.

By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.

Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.


So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?

I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.

I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.

This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.


for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}

To me, this is just bad design

Indeed it is.

Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.

So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:

for(unsigned int i=0; i<foo.Length(); i++)
{
  unsigned int index = foo.Length() - i - 1;
  thing[index] = something;
}

This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.

As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.


Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.

That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.

Instead of having some hobbyist-level API like

int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result

you should have something like:

err_t do_stuff (int32_t a, int32_t b, int32_t* result);

// returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
// the result is stored in [result], which is allocated by the caller
// upon errors the contents of [result] remain untouched

The API would then consistently reserve the return of every function for this error type.

(And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)


Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.

The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.

Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.

There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).

like image 27
Lundin Avatar answered Oct 17 '22 04:10

Lundin