Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct definition of size_t? [duplicate]

Tags:

c

size-t

c99

c11

First of all, what do I mean, by 'correct definition`?

For example, K&R in "C Programming Language" 2nd ed., in section 2.2 Data Types and Sizes, make very clear statements about integers:

  • There are short, int and long for integer types. They are needed to repesent values of different boundaries.
  • int is a "naturally" sized number for a specific hardware, so also probably the most fastest.
  • Sizes for integer types short, int and long are purely implementation-dependent.
  • But they have restrictions.
  • short and int shall hold at least 16 bits.
  • long shall hold at least 32 bits.
  • short >= int >= long.

That's very clear and unambiguous. And that is not the case for size_t type. In K&R 5.4 Address arithmetic, they say:

  • ...size_t is the unsigned integer type returned by the sizeof operator.
  • The sizeof operator yields the number of bytes required to store an object of the type of its operand.

In C99 standard draft, in 6.5.3.4 The sizeof operator, they say:

  • The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

In 7.17 Common definitions :

  • size_t which is the unsigned integer type of the result of the sizeof operator;

In 7.18.3 Limits of other integer types:

  • limit of size_t SIZE_MAX 65535

There is also a useful article - Why size_t matters. It says the following:

  • Okay, let's try to imagine, what it would be if there would be no size_t.
  • For example, let's take void *memcpy(void *s1, void const *s2, size_t n); standard function from <string.h>
  • Let's use int instead of size_t for n parameter.
  • But size of memory can't be negative, so let's better take unsigned int.
  • Good, seems like we are happy now and without size_t.
  • But unsigned int has limited size - what if there is some machine, that can copy chunks of memory larger than unsigned int can hold?
  • Okay, let's use unsigned long then, now we are happy?
  • But for those machines, which operate with smaller memory chunks, unsigned long would be inefficient, because long is not "natural" for them, they must perform additional operations to work with longs.
  • So let's why we need size_t - to represent a size of memory, that particular hardware can operate at once. On some machines it would be equal to int, on others - to long, depending on with which type they are most efficient.

What I understand from it is that size_t is strictly bounded with sizeof operator. And therefore size_t represents a maximum size of an object in bytes. It might also represent a number of bytes that particular CPU model can move at once.

But there is still much of mystery for me here:

  • What is "object" in terms of C?
  • Why it's limited to 65535, which is maximum number, that could be represented by 16 bits? The article on embedded.com says, that size_t could be 32 bit too.
  • K&R says, that int has "natural" size for the platform, and it can be equal to int or to long. So why not use it instead of size_t if it's "natural"?

UPDATE

There is similar question:

What is size_t in C?

But the answers for it doesn't provide clear definition or links to authoritative sources (if not count Wikipedia as such).

I want to know when to use size_t, when not to use size_t, why it was introduced, and what it really represents.

like image 323
Gill Bates Avatar asked Aug 29 '15 13:08

Gill Bates


3 Answers

when to use size_t

Use size_t to represent non-negative indexes, and to work with values that could be traced back to a sizeof expression.

when to not use size_t

Whenever a value could possibly be negative, e.g. when you subtract pointers. This is allowed for pointers into the same array, but it may yield a negative number, depending on the relative positions of pointers. There is another type ptrdiff_t defined for this situation.

why it was introduced

Designers of the standard had a choice of introducing a separate type, or requiring an existing type to be capable of holding sizes. The first choice gives compiler writers more flexibility, so the designers went with a separate type.

what it really represents

It is capable of representing the size of an object in memory, be it an array, a struct, an array of structs, an array of arrays of structs, or anything else. The size is expressed in bytes.

The type is also convenient for using for non-negative indexes, because it can represent an index to a structure of any size at the maximum granularity (i.e. an index into the largest possible array of chars, because the standard requires char to have the smallest possible size of 1).

like image 88
Sergey Kalinichenko Avatar answered Sep 21 '22 05:09

Sergey Kalinichenko


What is "object" in terms of C?

"Object" is a defined term. The C99 standard defines it as: "region of data storage in the execution environment, the contents of which can represent values" (section 3.14). A more colloquial definition might be "the storage in memory for a value." Objects come in different sizes, depending on the type of the value stored. That type includes not only simple types such as char and int, but also complex types such as structures and arrays. For example, the storage for an array is an object, within which is an object for each element.

Why it's limited to 65535, which is maximum number, that could be represented by 16 bits? The article on embedded.com says, that size_t could be 32 bit too.

You misunderstand. Re-read the first two paragraphs of section 7.18.3. SIZE_MAX represents the maximum value of type size_t, but its actual value is implementation dependent. The value given in the standard is the minimum value that that can be. In most implementations it is larger.

K&R says, that int has "natural" size for the platform, and it can be equal to int or to long. So why not use it instead of size_t if it's "natural"?

Because there is no particular reason that the maximum size of an object should be limited to the number of bytes expressible in a single machine word (which is pretty much what the "natural size" means). Nor, where int and long differ in size, is it clear which one should correspond to size_t, if either. Using size_t instead of one of these abstracts away machine details and makes your code more portable.

In response to the update:

I want to know, when to use size_t, when to not use size_t, why it was introduced and what it really represents.

size_t is primarily defined as the type of the result of sizeof. It follows that what it "really represents" is the size of an object.

Use size_t to hold values that represent or are related to the size of an object. That is expressly what it's for. For the most part, you can accomplish this by type matching: use variables of type size_t to store the values declared to have that type, such as the return values of certain functions (e.g. strlen()) and the results of certain operators (e.g. sizeof).

Do not use size_t for values that represent something other than an object size or something closely related (such as the sum or positive difference of object sizes).

like image 43
John Bollinger Avatar answered Sep 22 '22 05:09

John Bollinger


Why it's limited to 65535, which is maximum number, that could be represented by 16 bits?

Its atleast 16 bit.

According to the 1999 ISO C standard (C99), size_t is an unsigned integer type of at least 16 bit (see sections 7.17 and 7.18.3).

Why use size_t ?

size_t is a type guaranteed to hold any array index.

size_t can be any of (and also can be anything else other than these) unsigned char, unsigned short, unsigned int, unsigned long or unsigned long long, depending on the implementation.

And to use unsigned int or unsigned long in place of size_t ,reason is similar that they are not the only unsigned integral types.

Its purpose is to relieve the programmer from having to worry about which of the predefined types is used to represent sizes.

On one system, it might make sense to use unsigned int to represent sizes; on another, it might make more sense to use unsigned long or unsigned long long.

So using size_t adds the advantage that code is likely to be more portable.

like image 39
ameyCU Avatar answered Sep 21 '22 05:09

ameyCU