Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is ~size_t(0) (== 0xFFFFFFFF in most 32-bit systems) not a valid array index?

Tags:

Quoting from this blogpost:

http://www.codesynthesis.com/~boris/blog/2008/10/13/writing-64-bit-safe-code/

This works because a valid memory index can only be in the [0, ~size_t(0)-1] range. The same approach, for example, is used in std::string.

So why is ~size_t(0) (this should usually equal 0xFFFFFFFF in 32-bit systems) not a valid array index? I assume that if you have 32 bits you should be able to reference the whole range [0, 0xFFFFFFFF], no?

like image 496
Roland Avatar asked Sep 04 '11 22:09

Roland


People also ask

Can Size_t be used as index?

size_t is an unsigned integer that is capable of holding the size of the largest object you can allocate. It is useful for indexing because this means it can index into the largest array you can allocate.

Why is Size_t unsigned?

size_t is unsigned for historical reasons. On an architecture with 16 bit pointers, such as the "small" model DOS programming, it would be impractical to limit strings to 32 KB.

What should Size_t be used for?

size_t is commonly used for array indexing and loop counting. Programs that use other types, such as unsigned int, for array indexing may fail on, e.g. 64-bit systems when the index exceeds UINT_MAX or if it relies on 32-bit modular arithmetic.


1 Answers

IMPORTANT NOTE: The term "memory index" is ambiguous and confusing. The linked article refers strictly to array indexes, not addresses in memory. It is entirely valid for size_t to be incapable of representing all memory addresses, which is why we have the intptr_t type in C99. Of course, this doesn't apply to your workstation, which undoubtedly has a simple Von Neumann type architecture. (The question has since been edited to remove references to "memory indexes".)

The C standard guarantees that size_t can hold the size of any array. However, for any array a[N], the standard guarantees that a + N must be a valid pointer and compare not equal to any pointer to an element of a.

Therefore, size_t must be able to represent at least one value larger than any possible array index. Since ~(size_t)0 is guaranteed to be the maximum size_t value, it is a good choice of sentinel for array indexes.

Discussion:

  1. Why is ~(size_t)0 guaranteed to be the maximum? Because the standard explicitly says so: from §6.5.3.3: "If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E." Note that (size_t)-1 is guaranteed to also be the maximum by the rules of conversion from signed to unsigned types. Unfortunately, it is not always easy to find the definition for SIZE_MAX on your platform, so (size_t)-1 and ~(size_t)0 are preferred. (Note that this is no longer true if int can represent SIZE_MAX… but this isn't something that would happen in a real system.)

  2. What is the size of an array indexed from 0 to ~0? Such an array cannot exist according to the C standard, by the argument outlined at the top of this post.

  3. If you malloc(-1), the resulting memory region would have to start at 0. (FALSE) There are a lot of really bizarre cases which the standard allows but one doesn't encounter in practice. For example, imagine a system where (uintptr_t)-1 > (size_t)-1. The C standard is worded in exactly the way it is because it doesn't just run on your PC, it runs on bizarre little DSPs with Harvard architectures, and it runs on archaic systems with byzantine memory segmenting schemes. There are also some systems of historical interest where NULL pointers do not have the same representation as 0.

like image 66
Dietrich Epp Avatar answered Oct 25 '22 18:10

Dietrich Epp