I came upon this weird behaviour when dealing with C strings. This is an exercise from the K&R book where I was supposed to write a function that appends one string onto the end of another string. This obviously requires the destination string to have enough memory allocated so that the source string fits. Here is the code:
/* strcat: Copies contents of source at the end of dest */
char *strcat(char *dest, const char* source) {
char *d = dest;
// Move to the end of dest
while (*dest != '\0') {
dest++;
} // *dest is now '\0'
while (*source != '\0') {
*dest++ = *source++;
}
*dest = '\0';
return d;
}
During testing I wrote the following, expecting a segfault to happen while the program is running:
int main() {
char s1[] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
As far as I understand s1 gets an array of 6 chars
allocated and s2 an array of 13 chars
. I thought that when strcat
tries to write to s1 at indexes higher than 6 the program would segfault. Instead everything works fine, but the program doesn't exit cleanly, instead it does:
helloeheheheheheh
zsh: abort ./a.out
and exits with code 134, which I think just means abort.
Why am I not getting a segfault (or overwriting s2 if the strings are allocated on the stack)? Where are these strings in memory (the stack, or the heap)?
Thanks for your help.
I thought that when strcat tries to write to
s1
at indexes higher than6
the program would segfault.
Writing outside the bounds of memory you have allocated on the stack is undefined behaviour. Invoking this undefined behaviour usually (but not always) results in a segfault. However, you can't be sure that a segfault will happen.
The wikipedia link explains it quite nicely:
When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.
So, in this case, you could get a segfault, the program could abort, or sometimes it could just run fine. Or, anything. There is no way of guaranteeing the result.
Where are these strings in memory (the stack, or the heap)?
Since you've declared them as char []
inside main()
, they are arrays that have automatic storage, which for practical purposes means they're on the stack.
Edit 1:
I'm going to try and explain how you might go about discovering the answer for yourself. I'm not sure what actually happens as this is not defined behavior (as others have stated), but you can do some simple debugging to figure out what your compiler is actually doing.
Original Answer
My guess would be that they are both on the stack. You can check this by modifying your code with:
int main() {
char c1 = 'X';
char s1[] = "hello";
char s2[] = "eheheheheheh";
char c2 = '3';
printf("%s\n", strcat(s1, s2));
}
c1
and c2
are going to be on the stack. Knowing that you can check if s1
and s2
are as well.
If the address of c1
is less than s1
and the address of s1
is less than c2
then it is on the stack. Otherwise it is probably in your .bss
section (which would be the smart thing to do but would break recursion).
The reason I'm banking on the strings being on the stack is that if you are modifying them in the function, and that function calls itself, then the second call would not have its own copy of the strings and hence would not be valid... However, the compiler still knows that this function isn't recursive and can put the strings in the .bss
so I could be wrong.
Assuming my guess that it is on the stack is right, in your code
int main() {
char s1[] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
"hello"
(with the null terminator) is pushed onto the stack, followed by "eheheheheheh"
(with the null terminator).
They are both located one after the other (thanks to plain luck of the order in which you wrote them) forming a single memory block that you can write to (but shouldn't!)... That's why there is no seg fault, you can see this by breaking before printf
and looking at the addresses.
s2 == (uintptr_t)s1 + (strlen(s1) + 1)
should be true if I'm right.
Modifying your code with
int main() {
char s1[] = "hello";
char c = '3';
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
Should see c
overwritten if I'm right...
However, if I'm wrong and it is in the .bss
section then they could still be adjacent and you would be overwriting them without a seg fault.
If you really want to know, disassemble it:
Unfortunately I only know how to do it on Linux. Try using the nm <binary> > <text file>.txt
command or objdump -t <your_binary> > <text file>.sym
command to dump all the symbols from your program. The commands should also give you the section in which each symbol resides.
Search the file for the s1
and s2
symbols, if you don't find them it should mean that they are on the stack but we will check that in the next step.
Use the objdump -S your_binary > text_file.S
command (make sure you built your binary with debug symbols) and then open the .S
file in a text editor.
Again search for the s1
and s2
symbols, (hopefully there aren't any others, I suspect not but I'm not sure).
If you find their definitions followed by a push
or sub %esp
command, then they are on the stack. If you're unsure about what their definitions mean, post it back here and let us have a look.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With