Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What ((size_t*)ptr)[-1] mean in C?

Tags:

c

pointers

size

I want to know the size allocated to a pointer.

So I found this answer : how can i know the allocated memory size of pointer variable in c

And it has the below code.

#include <stdlib.h>
#include <stdio.h>

void * my_malloc(size_t s) 
{
  size_t * ret = malloc(sizeof(size_t) + s);
  *ret = s;
  return &ret[1];
}

void my_free(void * ptr) 
{
  free( (size_t*)ptr - 1);
}

size_t allocated_size(void * ptr) 
{
  return ((size_t*)ptr)[-1];
}

int main(int argc, const char ** argv) 
{
  int * array = my_malloc(sizeof(int) * 3);
  printf("%u\n", allocated_size(array));
  my_free(array);
  return 0;
}

The line (((size_t*)ptr)[-1]) works perfectly, but I don't understand why...

Can someone help me understand this magic line? Thanks!

like image 264
Firerazzer Avatar asked Dec 14 '18 12:12

Firerazzer


3 Answers

If ptr is pointing to a block of memory allocated by malloc, calloc, realloc, etc., then (((size_t*)ptr)[-1] invokes undefined behavior. My guess is that it is relying on the behavior of some random vendor's implementation of the standard library that happens to store the size of a memory block in the location just before the location returned by malloc etc.

DO NOT USE SUCH HACKS! If a program is allocating memory dynamically, it should be able to keep track of the sizes of memory it allocated without relying on undefined behavior.

The size of the memory block actually allocated by malloc etc. may be larger than the size requested, so perhaps you are interested in knowing the actual size of the block that was allocated, including the excess memory at the end of the block. Portable code should not need to know this as accessing locations beyond the requested size is also undefined behavior, but perhaps you want to know this size for curiousity's sake or for debugging purposes.

like image 105
Ian Abbott Avatar answered Sep 19 '22 22:09

Ian Abbott


First, let's start with what ((size_t*)ptr)[-1] means.

When you use the array subscript operator as (for example) A[B], this is exactly equivalent to *(A + B). So what is really happening here is pointer arithmetic followed by a dereference. This means that having a negative array index is valid, provided that the pointer in question doesn't point to the first element of the array.

As an example:

int a[5] = { 1, 2, 3, 4, 5 };
int *p = a + 2;
printf("p[0] = %d\n", p[0]);      // prints 3
printf("p[-1] = %d\n", p[-1]);    // prints 2
printf("p[-2] = %d\n", p[-2]);    // prints 1

So applying this to ((size_t*)ptr)[-1], this says that ptr points to an element of an array of one or more objects of type size_t (or to one element past the end of the array), and the subscript -1 gets the object just before the one ptr points to.

Now what does this mean in the context of the sample program?

The function my_malloc is a wrapper around malloc that allocates s bytes plus enough bytes for a size_t. It writes the value of s at the start of the malloc'ed buffer as a size_t, then returns a pointer to the memory after the size_t object.

So the memory actually allocated and the returned pointer look something like this (assuming sizeof(size_t) is 8):

        -----
0x80    | s |
0x81    | s |
0x82    | s |
0x83    | s |
0x84    | s |
0x85    | s |
0x86    | s |
0x87    | s |
0x88    |   |   <--- ptr
0x89    |   |
0x8A    |   |
...

When a pointer returned from my_malloc is passed to allocated_size, the function can read the requested size of the buffer with ((size_t*)ptr)[-1]:

        -----
0x80    | s |   <--- ptr[-1]
0x81    | s |
0x82    | s |
0x83    | s |
0x84    | s |
0x85    | s |
0x86    | s |
0x87    | s |
0x88    |   |   <--- ptr[0]
0x89    |   |
0x8A    |   |

The casted ptr points to one element past an array of size_t of size 1, so the pointer itself is valid and subsequently getting the object with the array subscript -1 is also valid. This is not undefined behavior as others have suggested since the pointer is being converted to/from a void * and points to a valid object of the specified type.

In this implementation, only the size of the requested buffer is stored before the returned pointer, however you could store more metadata there provided you allocate enough extra space for it.

The one thing this doesn't take into account is that the memory returned by malloc is suitably aligned for any purpose, and the pointer returned by my_malloc may not fit that requirement. So an object placed at the returned address may have an alignment issue and cause a crash. To account for this, additional bytes would need to be allocated to fit that requirement, and allocated_size and my_free would also need to be adjusted to account for that.

like image 30
dbush Avatar answered Sep 20 '22 22:09

dbush


First, let’s explain what (((size_t*)ptr)[-1]) does, assuming that it is valid:

  • (size_t*)ptr converts ptr to the type “pointer to size_t”.
  • ((size_t *)ptr)[-1] is, by definition1, equivalent to *((size_t *) ptr - 1).2 That is, it subtracts 1 from (size_t *) ptr and “deferences” the resulting pointer.
  • Pointer arithmetic is defined in terms of array elements and treats a single object as an array of one element.2 If (size_t *) ptr is pointing “just beyond” a size_t object, then *((size_t *) ptr - 1) points to the size_t object.
  • Thus, (((size_t*)ptr)[-1]) is the size_t object that is just before ptr.

Now, let’s discuss whether this expression is valid. ptr is obtained by this code:

void * my_malloc(size_t s) 
{
  size_t * ret = malloc(sizeof(size_t) + s);
  *ret = s;
  return &ret[1];
}

If malloc succeeds, it allocates space for any object of the requested size.4 So we can certainly store a size_t there5, except that this code ought to check the return value to guard against allocation failure. Furthermore, we may return &ret[1]:

  • &ret[1] is equivalent to &*(ret + 1), which is equivalent to ret + 1. This points one beyond the size_t we have stored at ret, which is valid pointer arithmetic.
  • The pointer is converted to the function return type, void *, which is valid.5

The code shown in the question does only two things with the value returned from my_malloc: retrieve the stored size with ((size_t*)ptr)[-1] and free the space using (size_t*)ptr - 1. These are both valid since the pointer conversion is appropriate and they are operating within the limits of pointer arithmetic.

However, there is a question about what further uses the returned value can be put to. As others have noted, while the pointer returned from malloc is suitably aligned for any object, the addition of a size_t produces a pointer that is suitably aligned only for an object whose alignment requirement is not stricter than size_t. For example, in many C implementations, this would mean the pointer could not be used for a double, which often requires eight-byte alignment while size_t is merely four bytes.

So we immediately see that my_malloc is not a full replacement for malloc. Nonetheless, perhaps it could be used only for objects with satisfactory alignment requirements. Let’s consider that.

I think many C implementations would have no trouble with this, but, technically, there is a problem here: malloc is specified to return space for one object of the requested size. That object can be an array, so the space can be used for multiple objects of the same type. However, it is not specified that the space can be used for multiple objects of different types. So, if some object other than a size_t is stored in the space returned by my_malloc, I do not see that the C standard defines the behavior. As I noted, this is a pedantic distinction; I do not expect a C implementation to have a problem with this, although increasingly aggressive optimizations have surprised me over the years.

One way to store multiple different objects in the space returned by malloc is to use a structure. Then we could put an int or a float or char * in the space after the size_t. However, we cannot do so by pointer arithmetic—using pointer arithmetic to navigate the members of a structure is not fully defined. Addressing structure members is properly done by name, not pointer manipulations. So returning &ret[1] from my_malloc is not a valid way (defined by the C standard) to provide a pointer to space that may be used for any object (even if the alignment requirement is satisfied).

Other Notes

This code improperly uses %u to format a value of type size_t:

printf("%u\n", allocated_size(array));

The specific integer type of size_t is implementation-defined and might not be unsigned. The resulting behavior may not be defined by the C standard. The proper format specifier is %zu.

Footnotes

1 C 2018 6.5.2.1 2.

2 More precisely, it is *((((size_t *) ptr)) + (-1)), but these are equivalent.

3 C 2018 6.5.6 8 and 9.

4 C 2018 7.22.3.4.

5 A very pedantic reader of C 2018 7.22.3.4 could object that size_t is not an object of the requested size but is an object of smaller size. I do not believe that is the intended meaning.

6 C 2018 6.3.2.3 1.

like image 38
Eric Postpischil Avatar answered Sep 17 '22 22:09

Eric Postpischil