Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abort instead of segfault with clear memory violation

I came upon this weird behaviour when dealing with C strings. This is an exercise from the K&R book where I was supposed to write a function that appends one string onto the end of another string. This obviously requires the destination string to have enough memory allocated so that the source string fits. Here is the code:

 /* strcat: Copies contents of source at the end of dest */
 char *strcat(char *dest, const char* source) {
  char *d = dest;
  // Move to the end of dest
  while (*dest != '\0') {
    dest++;
  } // *dest is now '\0'

  while (*source != '\0') {
    *dest++ = *source++;
  }
  *dest = '\0';
  return d;
}

During testing I wrote the following, expecting a segfault to happen while the program is running:

int main() {
  char s1[] = "hello";
  char s2[] = "eheheheheheh"; 
  printf("%s\n", strcat(s1, s2));
}

As far as I understand s1 gets an array of 6 chars allocated and s2 an array of 13 chars. I thought that when strcat tries to write to s1 at indexes higher than 6 the program would segfault. Instead everything works fine, but the program doesn't exit cleanly, instead it does:

helloeheheheheheh
zsh: abort      ./a.out

and exits with code 134, which I think just means abort.

Why am I not getting a segfault (or overwriting s2 if the strings are allocated on the stack)? Where are these strings in memory (the stack, or the heap)?

Thanks for your help.

like image 396
mck Avatar asked Dec 26 '22 23:12

mck


2 Answers

I thought that when strcat tries to write to s1 at indexes higher than 6 the program would segfault.

Writing outside the bounds of memory you have allocated on the stack is undefined behaviour. Invoking this undefined behaviour usually (but not always) results in a segfault. However, you can't be sure that a segfault will happen.

The wikipedia link explains it quite nicely:

When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.

So, in this case, you could get a segfault, the program could abort, or sometimes it could just run fine. Or, anything. There is no way of guaranteeing the result.

Where are these strings in memory (the stack, or the heap)?

Since you've declared them as char [] inside main(), they are arrays that have automatic storage, which for practical purposes means they're on the stack.

like image 62
Timothy Jones Avatar answered Jan 13 '23 06:01

Timothy Jones


Edit 1:

I'm going to try and explain how you might go about discovering the answer for yourself. I'm not sure what actually happens as this is not defined behavior (as others have stated), but you can do some simple debugging to figure out what your compiler is actually doing.

Original Answer

My guess would be that they are both on the stack. You can check this by modifying your code with:

int main() {
  char c1 = 'X';
  char s1[] = "hello";
  char s2[] = "eheheheheheh"; 
  char c2 = '3';

  printf("%s\n", strcat(s1, s2));
}

c1 and c2 are going to be on the stack. Knowing that you can check if s1 and s2 are as well.

If the address of c1 is less than s1 and the address of s1 is less than c2 then it is on the stack. Otherwise it is probably in your .bss section (which would be the smart thing to do but would break recursion).

The reason I'm banking on the strings being on the stack is that if you are modifying them in the function, and that function calls itself, then the second call would not have its own copy of the strings and hence would not be valid... However, the compiler still knows that this function isn't recursive and can put the strings in the .bss so I could be wrong.

Assuming my guess that it is on the stack is right, in your code

int main() {
  char s1[] = "hello";
  char s2[] = "eheheheheheh"; 
  printf("%s\n", strcat(s1, s2));
}

"hello" (with the null terminator) is pushed onto the stack, followed by "eheheheheheh" (with the null terminator).

They are both located one after the other (thanks to plain luck of the order in which you wrote them) forming a single memory block that you can write to (but shouldn't!)... That's why there is no seg fault, you can see this by breaking before printf and looking at the addresses.

s2 == (uintptr_t)s1 + (strlen(s1) + 1) should be true if I'm right.

Modifying your code with

int main() {
  char s1[] = "hello";
  char c = '3';
  char s2[] = "eheheheheheh"; 
  printf("%s\n", strcat(s1, s2));
}

Should see c overwritten if I'm right...

However, if I'm wrong and it is in the .bss section then they could still be adjacent and you would be overwriting them without a seg fault.

If you really want to know, disassemble it:

Unfortunately I only know how to do it on Linux. Try using the nm <binary> > <text file>.txt command or objdump -t <your_binary> > <text file>.sym command to dump all the symbols from your program. The commands should also give you the section in which each symbol resides.

Search the file for the s1 and s2 symbols, if you don't find them it should mean that they are on the stack but we will check that in the next step.

Use the objdump -S your_binary > text_file.S command (make sure you built your binary with debug symbols) and then open the .S file in a text editor.

Again search for the s1 and s2 symbols, (hopefully there aren't any others, I suspect not but I'm not sure).

If you find their definitions followed by a push or sub %esp command, then they are on the stack. If you're unsure about what their definitions mean, post it back here and let us have a look.

like image 30
nonsensickle Avatar answered Jan 13 '23 05:01

nonsensickle