Quoting from this blogpost:
http://www.codesynthesis.com/~boris/blog/2008/10/13/writing-64-bit-safe-code/
This works because a valid memory index can only be in the [0, ~size_t(0)-1] range. The same approach, for example, is used in std::string.
So why is ~size_t(0)
(this should usually equal 0xFFFFFFFF
in 32-bit systems) not a valid array index? I assume that if you have 32 bits you should be able to reference the whole range [0, 0xFFFFFFFF], no?
size_t is an unsigned integer that is capable of holding the size of the largest object you can allocate. It is useful for indexing because this means it can index into the largest array you can allocate.
size_t is unsigned for historical reasons. On an architecture with 16 bit pointers, such as the "small" model DOS programming, it would be impractical to limit strings to 32 KB.
size_t is commonly used for array indexing and loop counting. Programs that use other types, such as unsigned int, for array indexing may fail on, e.g. 64-bit systems when the index exceeds UINT_MAX or if it relies on 32-bit modular arithmetic.
IMPORTANT NOTE: The term "memory index" is ambiguous and confusing. The linked article refers strictly to array indexes, not addresses in memory. It is entirely valid for
size_t
to be incapable of representing all memory addresses, which is why we have theintptr_t
type in C99. Of course, this doesn't apply to your workstation, which undoubtedly has a simple Von Neumann type architecture. (The question has since been edited to remove references to "memory indexes".)
The C standard guarantees that size_t
can hold the size of any array. However, for any array a[N]
, the standard guarantees that a + N
must be a valid pointer and compare not equal to any pointer to an element of a
.
Therefore, size_t
must be able to represent at least one value larger than any possible array index. Since ~(size_t)0
is guaranteed to be the maximum size_t
value, it is a good choice of sentinel for array indexes.
Discussion:
Why is ~(size_t)0
guaranteed to be the maximum? Because the standard explicitly says so: from §6.5.3.3: "If the promoted type is an unsigned type, the expression ~E
is equivalent to the maximum value representable in that type minus E
." Note that (size_t)-1
is guaranteed to also be the maximum by the rules of conversion from signed to unsigned types. Unfortunately, it is not always easy to find the definition for SIZE_MAX
on your platform, so (size_t)-1
and ~(size_t)0
are preferred. (Note that this is no longer true if int
can represent SIZE_MAX
… but this isn't something that would happen in a real system.)
What is the size of an array indexed from 0 to ~0? Such an array cannot exist according to the C standard, by the argument outlined at the top of this post.
If you malloc(-1)
, the resulting memory region would have to start at 0. (FALSE) There are a lot of really bizarre cases which the standard allows but one doesn't encounter in practice. For example, imagine a system where (uintptr_t)-1 > (size_t)-1
. The C standard is worded in exactly the way it is because it doesn't just run on your PC, it runs on bizarre little DSPs with Harvard architectures, and it runs on archaic systems with byzantine memory segmenting schemes. There are also some systems of historical interest where NULL
pointers do not have the same representation as 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With