I've noticed that my C compiler (gcc) will let me do stuff like:
#include <stdio.h>
main(){
short m[32768];
short y = -1;
short z = -1;
printf("%u\n", y);
m[y] = 12;
printf("%d\n%d\n", y, m[z]);
}
When I run it it spits out:
4294967295
12
12
Which seems a little baffling to me.
First of all, is it safe for me to run programs like this? Is there any chance I might accidentally write over the operating system (I'm running OS X in case it's relevant)?
Also, I had expected at least some kind of segfault error like I have encountered in the past, but quietly ignoring an error like this really scares me. How come this program doesn't segfault on me?
And finally, out of curiosity (this might be the silliest question), is there a method to the madness? Can I expect all ANSI C compilers to work this way? How about gcc on different platforms? Is the layout of memory well defined that it is exploitable (perhaps if you were out to write cross-platform obfuscated code)?
It's basically impossible to write safe code in C.
The purpose of C Secure is to specify secure coding rules that can be automatically enforced. These can be used to detect security flaws in C programming. To be considered a security flaw, a software bug must be triggerable by the actions of a malicious user or attacker.
Memory safe languages include Rust, Go, C#, Java, Swift, Python, and JavaScript. Languages that are not memory safe include C, C++, and assembly.
The C language defines the behavior of certain programs as "undefined". They can do anything. We'll call such programs erroneous.
One of them is a program that accesses outside the declared/allocated bounds of an array, which your program very carefully does.
You program is erroneous; the thing your erroneous program happens to do is what you see :-} It could "overwrite the OS"; as a practical matter, most modern OSes prevent you from doing that, but you can overwrite critical values in your process space, and your process could crash, die or hang.
The simple response is, "don't write erroneous programs". Then the behavior you see will make "C" sense.
In this particular case, with your particular compiler, the array indexing "sort of" works: you index outside the array and it picks up some value. The space allocated to m is in the stack frame; m[0] is at some location in the stack frame and so is "m[-1]" based on machine arithmetic combining the array address and the index, so a segfault does not occur and a memory location is accessed. This lets the compiled program read and write that memory location ... as an erroneous program. Basically, compiled C programs don't check to see if your array access is out of bounds.
Our CheckPointer tool when applied to this program will tell you the array index is illegal at execution time. So, you can either eyeball the program yourself to see if you've made a mistake, or let CheckPointer tell you when you make a mistake. I strongly suggest you do the eyeballing in any case.
You may be interested in INRIA's CompCert C, a formally, mathematically verifiable and verified implementation of the C language. It's the same authors as the famous Coq proof assistant. There is also another variante Verifiable C.
I don't know much about it, but I know that plane engineers in France use it to program the upcoming embedded computers in planes, so at least in France it's an officially accepted language for critical systems programming.
Lastly, note that a formally verifiable language is different from a safe language.
For example, MISRA C is said to be a safe C language (although this is debated), and there are also Safe-C, Microsoft's Checked-C and Cyclone, along with safe libraries without changing the compiler such as Safe C Library and libsrt, or just using the standard compiler and libraries but with a sourcecode analyzer such as frama-c.
But although safe languages provide fixes to some issues like buffer overflows but no guarantee of consistent logic flow as is needed for critical systems. For example, CompCert C should always produce the same set of Assembler instructions for the same C instructions. Formally verifiable language such as CompCert C and Ada provide such formal guarantees.
You may also be interested in these articles:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With