Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the bounds of the heap?

What are the bounds of the heap in a given process? I understand that there is probably no simple answer to this question, so I'm interested in answers to the following specifically:

  • Is there a standard heap size/location for 64-bit processes under Linux on AMD64?
  • If I'm implementing a language runtime, how can I find out where I'm not allowed to put the heap (again, Linux/AMD64)
  • Is there a portable way for an application to find out where it begins/ends?
like image 394
brooks94 Avatar asked Feb 18 '14 16:02

brooks94


2 Answers

I assume you are trying to write your own heap allocator here, and from the tags assume you are doing it in Linux.

SunEric has given you a useful indication of what memory you might be able to use, however, the memory you can use is the memory that the operating system gives you. IE to get memory into your process, you will need to call the operating system to map virtual memory into the process space (and some physical memory behind it). malloc() abstracts this for you, and implements 'the heap' in C. It can get its memory two ways:

  1. Using the brk system call (mapped to the C library brk or sbrk)

  2. Using mmap with MAP_ANON (or more precisely the underlying system call mmap2).

brk is the classical way of allocating memory for the heap, and normally when we talk about 'the heap', we mean memory allocated this way (though brk can be used to allocate memory other than for the heap, and heap items may live elsewhere - see below). Here is a great answer to how brk allocation works, upon which I am unable to improve. What location the memory uses is really a result of arithmetic. The heap follows the BSS of the program when loaded - i.e. the BSS's value is grown as the heap expands, so the start is really determined by the OS and the dynamic loader. The end of the heap is thus determined by this and the size of the heap, (i.e. how large you've grown it to).

mmap is less clear cut. It takes an addr parameter:

If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the most portable method of creating a new mapping. If addr is not NULL, then the kernel takes it as a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page boundary. The address of the new mapping is returned as the result of the call.

So if you use mmap to get space for particular heap items (as malloc may do particularly for large objects), either the OS choses its location, with or without a hint. If you use MAP_FIXED it will give you exactly that location or failed. In this sense, your heap (or items within it) could be anywhere the OS will let you map memory.

You asked whether there is a portable way to find out where the heap begins and ends. Portable implies a language, and I'll assume C. In respect of the brk type heap, yes there is (well reasonably portable). man end gives:

NAME

etext, edata, end - end of program segments

SYNOPSIS

extern etext;

extern edata;

extern end;

DESCRIPTION

The addresses of these symbols indicate the end of various program segments:

  • etext: This is the first address past the end of the text segment (the program code).

  • edata: This is the first address past the end of the initialized data segment.

  • end: This is the first address past the end of the uninitialized data segment (also known as the BSS segment).

As the heap runs from the end of the BSS at load time to the top of the BSS at run time, one approach would be to take the value of end at load as the start as the bottom of the heap and the value of end when evaluating as the end of the heap. This would miss the fact that libc itself and the shared libraries may allocate things before main() is called. So a more conservative approach would be to say it is the area between edata and end, though this might strictly speaking include things not on the heap.

If you didn't mean in C, you need to use a similar technique. Take the 'program break' (i.e. the top of memory space) and subtract the lowest address you gave for your heap.

If you want to see the memory allocation for the heap for an arbitrary process:

$ cat /proc/$$/maps | fgrep heap
01fe6000-02894000 rw-p 00000000 00:00 0                                  [heap]

Replace $$ by the PID of the process you want to examine.

like image 55
abligh Avatar answered Nov 02 '22 15:11

abligh


On modern 64 bit AMD64 CPU's not all address lines are enabled to provide us 2^64 = 16 exabytes of virtual address space. Perhaps on AMD64 architectures has 48 lower bits enabled respectively resulting to 2^48 = 256TB of address space. Thus theoretically architecture limits nearly to 256TB. So if you have a disk space of 256TB which is allowed for swap partitioning you could get 256TB of heap. If at all you have limitations on number & size of swap partitions you are limited lesser than 256TB even though available disk space large.

In current AMD's 48 bit implementation, the full virtual memory range that AMD64 CPU's able to address in canonical format (depicted in below figure) is in two halves ranging from 0 to 00007FFFFFFFFFFF and from FFFF800000000000 to FFFFFFFFFFFFFFFF, resulting in to available virtual address space totaling to 256TB. The upper half memory region address space meant for Kernel space and lower half being user space for code, heap, stack segments. Thus the lower half address bits grow upwards with the availability of more virtual address bits leading more virtual space for mapping different segments in to memory. Which mean heap can be grow up till 256TB maximum.

 0xFFFFFFFFFFFFFFFF +-----------+
                    |   Kernel  |
                    |           |
 0xFFFF800000000000 +-----------+
                    |    Non    |
                    | Canonical |
                    |   range   |
 0x00007FFFFFFFFFFF +-----------+
                    |    User   |
                    |           |
                0x0 +-----------+ 

However the heap starts above the text segment growing up and one end of it can be found using sbrk with argument as 0. As heap is non continuous when you call malloc() it returns address from anywhere in the virtual address space.

You shouldn't been worrying much how it is working deep from roots as it is abstracted in modern processors.

like image 45
Sunil Bojanapally Avatar answered Nov 02 '22 16:11

Sunil Bojanapally