What are the bounds of the heap in a given process? I understand that there is probably no simple answer to this question, so I'm interested in answers to the following specifically:
I assume you are trying to write your own heap allocator here, and from the tags assume you are doing it in Linux.
SunEric has given you a useful indication of what memory you might be able to use, however, the memory you can use is the memory that the operating system gives you. IE to get memory into your process, you will need to call the operating system to map virtual memory into the process space (and some physical memory behind it). malloc()
abstracts this for you, and implements 'the heap' in C. It can get its memory two ways:
Using the brk
system call (mapped to the C library brk
or sbrk
)
Using mmap
with MAP_ANON
(or more precisely the underlying system call mmap2
).
brk
is the classical way of allocating memory for the heap, and normally when we talk about 'the heap', we mean memory allocated this way (though brk
can be used to allocate memory other than for the heap, and heap items may live elsewhere - see below). Here is a great answer to how brk
allocation works, upon which I am unable to improve. What location the memory uses is really a result of arithmetic. The heap follows the BSS of the program when loaded - i.e. the BSS's value is grown as the heap expands, so the start is really determined by the OS and the dynamic loader. The end of the heap is thus determined by this and the size of the heap, (i.e. how large you've grown it to).
mmap
is less clear cut. It takes an addr
parameter:
If
addr
isNULL
, then the kernel chooses the address at which to create the mapping; this is the most portable method of creating a new mapping. Ifaddr
is notNULL
, then the kernel takes it as a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page boundary. The address of the new mapping is returned as the result of the call.
So if you use mmap
to get space for particular heap items (as malloc
may do particularly for large objects), either the OS choses its location, with or without a hint. If you use MAP_FIXED
it will give you exactly that location or failed. In this sense, your heap (or items within it) could be anywhere the OS will let you map memory.
You asked whether there is a portable way to find out where the heap begins and ends. Portable implies a language, and I'll assume C. In respect of the brk
type heap, yes there is (well reasonably portable). man end
gives:
NAME
etext
,edata
,end
- end of program segmentsSYNOPSIS
extern etext;
extern edata;
extern end;
DESCRIPTION
The addresses of these symbols indicate the end of various program segments:
etext
: This is the first address past the end of the text segment (the program code).
edata
: This is the first address past the end of the initialized data segment.
end
: This is the first address past the end of the uninitialized data segment (also known as the BSS segment).
As the heap runs from the end of the BSS
at load time to the top of the BSS
at run time, one approach would be to take the value of end
at load as the start as the bottom of the heap and the value of end
when evaluating as the end of the heap. This would miss the fact that libc
itself and the shared libraries may allocate things before main()
is called. So a more conservative approach would be to say it is the area between edata
and end
, though this might strictly speaking include things not on the heap.
If you didn't mean in C, you need to use a similar technique. Take the 'program break' (i.e. the top of memory space) and subtract the lowest address you gave for your heap.
If you want to see the memory allocation for the heap for an arbitrary process:
$ cat /proc/$$/maps | fgrep heap
01fe6000-02894000 rw-p 00000000 00:00 0 [heap]
Replace $$
by the PID of the process you want to examine.
On modern 64 bit AMD64 CPU's not all address lines are enabled to provide us 2^64 = 16 exabytes
of virtual address space. Perhaps on AMD64 architectures has 48
lower bits enabled respectively resulting to 2^48 = 256TB
of address space. Thus theoretically architecture limits nearly to 256TB
. So if you have a disk space of 256TB
which is allowed for swap partitioning you could get 256TB
of heap. If at all you have limitations on number & size of swap partitions you are limited lesser than 256TB
even though available disk space large.
In current AMD's 48 bit implementation, the full virtual memory range that AMD64 CPU's able to address in canonical format (depicted in below figure) is in two halves ranging from 0
to 00007FFFFFFFFFFF
and from FFFF800000000000
to FFFFFFFFFFFFFFFF
, resulting in to available virtual address space totaling to 256TB
. The upper half memory region address space meant for Kernel space and lower half being user space for code, heap, stack segments. Thus the lower half address bits grow upwards with the availability of more virtual address bits leading more virtual space for mapping different segments in to memory. Which mean heap can be grow up till 256TB
maximum.
0xFFFFFFFFFFFFFFFF +-----------+
| Kernel |
| |
0xFFFF800000000000 +-----------+
| Non |
| Canonical |
| range |
0x00007FFFFFFFFFFF +-----------+
| User |
| |
0x0 +-----------+
However the heap starts above the text segment growing up and one end of it can be found using sbrk
with argument as 0. As heap is non continuous when you call malloc() it returns address from anywhere in the virtual address space.
You shouldn't been worrying much how it is working deep from roots as it is abstracted in modern processors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With