We have a mechanism that monitors the load & store instructions that captures the address referenced. I'd like to classify the addresses whether they belong to the stack, the heap or the region where the static variables are allocated. Is there a way to do this classification programatically?
My initial thought was to do a malloc() with a small memory request (1?) as soon as the process starts running so that I could capture the "base address" (or starting address) for the heap. That way, I can distinguish from those variables statically allocated and the rest. For those references not belonging to the static region (those are, heap and stack), how could I differentiate them?
Some small tests show that the following simple code (run in Linux 3.18/x86-64 compiled with gcc 4.8.4)
#include <stdio.h>
#include <stdlib.h>
int x;
int foo (void)
{
int s;
int *h = malloc (sizeof(int));
printf ("x = %p, *s = %p, h = %p\n", &x, &s, h);
}
int main (int argc, char *argv[])
{
foo();
return 0;
}
shows some randomization of the address space (not in the static variables but in the remaining part -- heap & stack) which may add some uncertainty but maybe a way to find the limits of these regions of the addres space.
There is no standard C API for this, which means that all possible solutions are going to be based on platform-specific hacks. Also, this answer limits itself to single-threaded applications.
The stack is a continuous memory region. Therefore all you need to know are two numbers: the top of the stack and the bottom of the stack. The top of the stack is basically limited by the stack frame of the current function. However, since the size of the current stack frame cannot be accessed from C code, it's a difficult to tell where exactly the current frame ends. The trick here is to call one more function from the current and use an addess the in the called functions stack frame as the boundary value for stack_top
.
Learning the bottom of the stack is simpler - its value stays constant during the execution of the program, and is bounded by the stack frame of the entry-point function (main()
in C programs). Therefore taking address of some local variable in the main()
function is a sufficient approximation.
One more caveat is that x86
stack grows backwards, which means that the top of the stack has a smaller address than the bottom. This code sums it up:
void *stack_bottom;
bool IS_IN_STACK(void *x) __attribute__((noinline));
bool IS_IN_STACK(void *x) {
void *stack_top = &stack_top;
return x <= stack_bottom && x >= stack_top;
}
int main (int argc, char *argv[]) {
int x;
stack_bottom = &x;
...
The logic is even simpler here. Static variables are allocated in a memory region starting with a fixed, platform-specific address. Usually this region precedes all other regions in memory. The only thing that has to be learned therefore is the end address of this static memory region.
Luckily, GCC linker provides symbols end
, edata
and etext
that denote the end of .bss
, .data
and .text
segments respectively. Static variables are allocated either in .bss
or .data
segment, therefore this check should be sufficient on most platforms:
#define IS_STATIC(x) ((void*)(x) <= (void*)&end || (void*)(x) <= (void*)&edata)
This macro checks both edata
and end
to avoid making assumptions about which of .bss
and .data
comes first in memory.
Heap variables are typically allocated in addresses directly following the addresses in .data
and .bss
regions. However, sometimes heap addresses may belong to non-continuous memory ranges. Therefore the best you can do here is to read Linux process files to find out the memory mappings as suggested in the other answer. Alternatively, just check if both IS_IN_STACK
and IS_STATIC
return false.
The complete program using these macros:
int x;
extern int end, edata;
void *stack_bottom;
bool IS_IN_STACK(void *x) __attribute__((noinline));
bool IS_IN_STACK(void *x) {
void *stack_top = &stack_top;
return x <= stack_bottom && x >= stack_top;
}
#define IS_STATIC(x) ((void*)(x) <= (void*)&end || (void*)(x) <= (void*)&edata)
int foo (void)
{
int s;
int *h = malloc (sizeof(int));
printf ("x = %p, *s = %p, h = %p\n", &x, &s, h);
// prints 0 1 0
printf ("%d %d %d\n", IS_IN_STACK(&x), IS_IN_STACK(&s), IS_IN_STACK(h));
// prints 1 0 0
printf ("%d %d %d\n", IS_STATIC(&x), IS_STATIC(&s), IS_STATIC(h));
}
int main (int argc, char *argv[])
{
int x;
stack_bottom = &x;
foo();
return 0;
}
I guess in order to get the correct result you should parse /proc/<pid>/maps
file on Linux. Sample contents:
# cat maps
00400000-00407000 r-xp 00000000 fc:02 1837717 /sbin/getty
00606000-00607000 r--p 00006000 fc:02 1837717 /sbin/getty
00607000-00608000 rw-p 00007000 fc:02 1837717 /sbin/getty
00608000-0060a000 rw-p 00000000 00:00 0
0252e000-0254f000 rw-p 00000000 00:00 0 [heap]
7f3ca601f000-7f3ca6833000 r--p 00000000 fc:02 2105304 /usr/lib/locale/locale-archive
...
7f3ca7656000-7f3ca7657000 r--p 00022000 fc:02 1711858 /lib/x86_64-linux-gnu/ld-2.19.so
7f3ca7657000-7f3ca7658000 rw-p 00023000 fc:02 1711858 /lib/x86_64-linux-gnu/ld-2.19.so
7f3ca7658000-7f3ca7659000 rw-p 00000000 00:00 0
7fffbbcf2000-7fffbbd13000 rw-p 00000000 00:00 0 [stack]
7fffbbdfc000-7fffbbdfe000 r-xp 00000000 00:00 0 [vdso]
7fffbbdfe000-7fffbbe00000 r--p 00000000 00:00 0 [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Refer to proc(5)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With