Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the ramifications of returning the value -1 as a size_t return value in C?

Tags:

c

size-t

I am reading a textbook and one of the examples does this. Below, I've reproduced the example in abbreviated form:

#include <stdio.h>
#define SIZE 100

size_t linearSearch(const int array[], int searchVal, size_t size);

int main(void)
{
    int myArray[SIZE];
    int mySearchVal;
    size_t returnValue;

    // populate array with data & prompt user for the search value

    // call linear search function
    returnValue = linearSearch(myArray, mySearchVal, SIZE);

    if (returnValue != -1)
        puts("Value Found");
    else
        puts("Value Not Found");
}

size_t linearSearch(const int array[], int key, size_t size)
{
    for (size_t i = 0; i < size; i++) {
        if (key == array[i])
            return i;
    }
    return -1;
}

Are there any potential problems with this? I know size_t is defined as an unsigned integral type so it seems as if this might be asking for trouble at some point if I'm returning -1 as a size_t return value.

like image 842
Joseph Mills Avatar asked Oct 18 '22 10:10

Joseph Mills


2 Answers

There's a few APIs that come to mind which use the maximum signed or unsigned integer value as a sentinel value. For example, C++'s std::string::find() method returns std::string::npos if the value given to find() could not be found within the string, and std::string::npos is equal to (std::string::size_type)-1.

Similarly, on iOS and OS X, NSArray's indexOfObject: method return NSNotFound when the object cannot be found in the array. Surprisingly, NSNotFound is actually defined to NSIntegerMax, which is either INT_MAX for 32-bit platforms or LONG_MAX for 64-bit platforms, even though NSArray indexes are typically NSUInteger (which is either unsigned int for 32-bit platforms or unsigned long for 64-bit platforms).

It does mean that there will be no distinction between “not found” and “element number 18,446,744,073,709,551,615” (for 64-bit systems), but whether that is an acceptable trade off is up to you.

An alternative is to have the function return the index through a pointer argument and have the function's return value indicate success or failure, e.g.

#include <stdbool.h>

bool linearSearch(const int array[], int val, size_t size, size_t *index)
{
    // find value and then

    if (found)
    {
        *index = indexOfFoundItem;
        return true;
    }
    else
    {
        *index = 0; // optional, in some cases, better to leave *index untouched
        return false;
    }
}
like image 96
dreamlax Avatar answered Dec 08 '22 08:12

dreamlax


Your compiler may decide to complain about comparing signed with unsigned — GCC or Clang will if provoked* — but otherwise "it works". On two's-complement machines (most machines these days), (size_t)-1 is the same as SIZE_MAX — indeed, as discussed in extenso in the comments, it is the same for one's-complement or sign-magnitude machines because of the wording in §6.3.1.3 of the C99 and C11 standards).

Using (size_t)-1 to indicate 'not found' means that you can't distinguish between the last entry in the biggest possible array and 'not found', but that's seldom an actual problem.

So, it's just the one edge case where I could end up having a problem?

The array would have to be an array of char, though, to be big enough to cause trouble — and while you could have 4 GiB memory with a 32-bit machine, it's pretty implausible to have all that memory committed to a character array (and it's very much less likely to be an issue with 64-bit machines; most don't run to 16 exbibytes of memory). So it isn't a practical edge case.

In POSIX, there is a ssize_t type, the signed type of the same size of size_t. You could consider using that instead of size_t. However, it causes the same angst that (size_t)-1 causes, in my experience. Plus on a 32-bit machine, you could have a 3 GiB chunk of memory treated as an array of char, but with ssize_t as a return type, you couldn't usefully use more than 2 GiB — or you'd need to use SSIZE_MIN (if it existed; I'm not sure it does) instead of -1 as the signal value.


* GCC or Clang has to be provoked fairly hard. Simply using -Wall is not sufficient; it takes -Wextra (or the specific -Wsign-compare option) to trigger a warning. Since I routinely compile with -Wextra, I'm aware of the issue; not everyone is as vigilant.

Comparing signed and unsigned quantities is fully defined by the standard, but can lead to counter-intuitive results (because small negative numbers appear very large when converted to unsigned values), which is why the compilers complain if requested to do so.

like image 29
Jonathan Leffler Avatar answered Dec 08 '22 10:12

Jonathan Leffler