I've been delving deeper into Linux and C, and I'm curious how functions are stored in memory. I have the following function:
void test(){
printf( "test\n" );
}
Simple enough. When I run objdump on the executable that has this function, I get the following:
08048464 <test>:
8048464: 55 push %ebp
8048465: 89 e5 mov %esp,%ebp
8048467: 83 ec 18 sub $0x18,%esp
804846a: b8 20 86 04 08 mov $0x8048620,%eax
804846f: 89 04 24 mov %eax,(%esp)
8048472: e8 11 ff ff ff call 8048388 <printf@plt>
8048477: c9 leave
8048478: c3 ret
Which all looks right. The interesting part is when I run the following piece of code:
int main( void ) {
char data[20];
int i;
memset( data, 0, sizeof( data ) );
memcpy( data, test, 20 * sizeof( char ) );
for( i = 0; i < 20; ++i ) {
printf( "%x\n", data[i] );
}
return 0;
}
I get the following (which is incorrect):
55
ffffff89
ffffffe5
ffffff83
ffffffec
18
ffffffc7
4
24
10
ffffff86
4
8
ffffffe8
22
ffffffff
ffffffff
ffffffff
ffffffc9
ffffffc3
If I opt to leave out the memset( data, 0, sizeof( data ) );
line, then the right-most byte is correct, but some of them still have the leading 1s.
Does anyone have any explanation for why
using memset to clear my array results in an incorrect (or inaccurate) representation of the function, and
what is this byte stored as in memory? ints? char? I don't quite understand what's going on here. (clarification: what type of pointer would I use to traverse such data in memory?)
My immediate thought is that this is a result of x86 having an instructions that don't end on a byte or half-byte boundary. But that doesn't make a whole lot of sense, and shouldn't cause any problems.
This is because in machine code, a function is referenced by its location in RAM, not its name. The compiler-output object file may have a func entry in its symbol table referring to this block of machine code, but the symbol table is read by software, not something the CPU hardware can decode and run directly.
Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
Python stores object in heap memory and reference of object in stack. Variables, functions stored in stack and object is stored in heap.
Functions are objects. Therefore, the function's identifier is in the stack, and the function's value is stored in the heap. A function creates an activation object when it's called.
I believe your chars
are being sign-extended to the width of an integer. You might get results closer to what you want by explicitly casting the value when you print it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With