Segmentation fault in strcpy

Question

consider the program below

    char str[5];
    strcpy(str,"Hello12345678");
    printf("%s",str);

When run this program gives segmentation fault.

But when strcpy is replaced with following, program runs fine.

strcpy(str,"Hello1234567");

So question is it should crash when trying to copy to str any other string of more than 5 chars length.

So why it is not crashing for "Hello1234567" and only crashing for "Hello12345678" ie of string with length 13 or more than 13.

This program was run on 32 bit machine .

paxdiablo · Accepted Answer

There are three types of standards behaviour you should be interested in.

1/ Defined behaviour. This will work on all complying implementations. Use this freely.

2/ Implementation-defined behaviour. As stated, it depends on the implementation but at least it's still defined. Implementations are required to document what they do in these cases. Use this if you don't care about portability.

3/ Undefined behaviour. Anything can happen. And we mean anything, up to and including your entire computer collapsing into a naked singularity and swallowing itself, you and a large proportion of your workmates. Never use this. Ever! Seriously! Don't make me come over there.

Copying more that 4 characters and a zero-byte to a char[5] is undefined behaviour.

Seriously, it doesn't matter why your program crashes with 14 characters but not 13, you're almost certainly overwriting some non-crashing information on the stack and your program will most likely produce incorrect results anyway. In fact, the crash is better since at least it stops you relying on the possibly bad effects.

Increase the size of the array to something more suitable (char[14] in this case with the available information) or use some other data structure that can cope.

Update:

Since you seem so concerned with finding out why an extra 7 characters doesn't cause problems but 8 characters does, let's envisage the possible stack layout on entering main(). I say "possible" since the actual layout depends on the calling convention that your compiler uses. Since the C start-up code calls main() with argc and argv, the stack at the start of main(), after allocating space for a char[5], could look like this:

+------------------------------------+
| C start-up code return address (4) |
| argc (4)                           |
| argv (4)                           |
| x = char[5] (5)                    |
+------------------------------------+

When you write the bytes Hello1234567\0 with:

strcpy (x, "Hello1234567");

to x, it overwrites the argc and argv but, on return from main(), that's okay. Specifically Hello populates x, 1234 populates argv and 567\0 populates argc. Provided you don't actually try to use argc and/or argv after that, you'll be okay:

+------------------------------------+ Overwrites with:
| C start-up code return address (4) |
| argc (4)                           |   '567<NUL>'
| argv (4)                           |   '1234'
| x = char[5] (5)                    |   'Hello'
+------------------------------------+

However, if you write Hello12345678\0 (note the extra "8") to x, it overwrites the argc and argv and also one byte of the return address so that, when main() attempts to return to the C start-up code, it goes off into fairy land instead:

+------------------------------------+ Overwrites with:
| C start-up code return address (4) |   '<NUL>'
| argc (4)                           |   '5678'
| argv (4)                           |   '1234'
| x = char[5] (5)                    |   'Hello'
+------------------------------------+

Again, this depends entirely on the calling convention of your compiler. It's possible a different compiler would always pad out arrays to a multiple of 4 bytes and the code wouldn't fail there until you wrote another three characters. Even the same compiler may allocate variables on the stack frame differently to ensure alignment is satisfied.

That's what they mean by undefined: you don't know what's going to happen.

Douglas Leeder · Answer

You're copying to the stack, so it's dependent on what the compiler has placed on the stack, for how much extra data will be required to crash your program.

Some compilers might produce code that will crash with only a single byte over the buffer size - it's undefined what the behaviour is.

I guess size 13 is enough to overwrite the return address, or something similar, which crashes when your function returns. But another compiler or another platform could/will crash with a different length.

Also your program might crash with a different length if it ran for a longer time, if something less important was being overwritten.

sharptooth · Answer

For 32-bit Intel platform the explanation is the following. When you declare char[5] on stack the compiler really allocates 8 bytes because of alignment. Then it's typical for functions to have the following prologue:

push ebp
mov ebp, esp

this saves ebp registry value on stack, then moves esp register value into ebp for using esp value to access the parameters. This leads to 4 more bytes on stack to be occupied with ebp value.

In the epilogue ebp is restored, but its value is usually only used for accessing stack-allocated function parameters, so overwriting it may not hurt in most cases.

So you have the following layout (stack grows downwards on Intel): 8 bytes for your array, then 4 bytes for ebp, then usually the return address.

This is why you need to overwrite at least 13 bytes to crash your program.

Stephan202 · Answer

To add to the above answers: you can test for bugs like these with a tool such as Valgrind. If you're on Windows, have a look at this SO thread.

Segmentation fault in strcpy

Tags:

c++

c

undefined-behavior

anand

4 Answers

paxdiablo

Douglas Leeder

sharptooth

Stephan202

Recent Activity

Donate For Us

Segmentation fault in strcpy

Tags:

c++

c

undefined-behavior

anand

4 Answers

paxdiablo

Douglas Leeder

sharptooth

Stephan202

Related questions

Recent Activity

Donate For Us