Given the code snippet:
int main()
{
printf("Val: %d", 5);
return 0;
}
is there any guarantee that the compiler would store "Val: %d"
and '5'
contiguously? For example:
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| ... | %d | ' ' | ':' | 'l' | 'a' | 'V' | '5' | ... |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
^ ^ ^
| Format String | int |
Exactly how does are these parameters allocated in memory?
Furthermore, does the printf function access the int relative to the format string or by absolute value? So for example, in the data
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| ... | %d | ' ' | ':' | 'l' | 'a' | 'V' | '5' | ... |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
^ ^ ^
| Format String | int |
when the function encounters %d
would there already be a stored memory address for the first parameter of the function which would be referenced or would the value be calculated relative to the first element of the format string?
Sorry if I'm being confusing, my primary goal is to understand string formatting exploits where the user is allowed to supply the format string as described in this document
http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf
My concerns arise on the attack described on page 3 and 4. I figured that the %x
's are to skip the 16 bits that the string takes up which would indicate that the function allocated contiguously and references relatively but other sources indicate that there is not guaranteed that the compiler must allocate contiguously and I was concerned that the paper was a simplification.
is there any guarantee that the compiler would store "Val: %d" and '5' contiguously
It's virtually guaranteed they won't be. The 5 is small enough that it can be embedded right in the instruction stream rather than loaded through a memory address (pointer) -- something like movl #5, %eax
and/or followed by a push onto the stack -- whereas the string object will be laid out in the read-only data area of the executable image, and will be referenced via a pointer. We're talking about compile time layout of the executable image.
Unless you mean the runtime layout of the stack in which yes, the word-sized pointer to that string, and the word-sized constant 5, will be next to each other. But the order is probably the reverse of what you expect -- study 'C function calling convention'.
[Later edit: Running some code samples with -S (output assembly) now; I'm reminded that with light register usage in the caller (i.e. CPU registers can be overwritten without harm), and few arguments to the called function, the arguments can be passed entirely via registers to save instructions and memory. So the layout of the stack is actually tricky to predict, even if the attacker had access to the source code. Especially with gcc -O2, which collapsed my main -> my_function -> printf function sequence into main -> printf]
Most exploits employ stack overruns, since malicious code runs into a brick wall trying to modify memory in the aforementioned read-only data area -- OS aborts the process.
The behavior of printf is peculiar in that the format string is like a miniature computer program that tells printf to look at arguments on the stack for every '%' format specifier that it finds. If those arguments were never in fact pushed, and/or were of different sizes, printf will blindly traverse portions of the stack it shouldn't and perhaps reveal data further up the stack (down the call chain) where private data may lie. If the first argument to printf is at least a constant, a compiler can at least warn you when subsequent arguments mismatch the '%' specifiers, but when it's a variable, all bets are off.
printf is awful from a security perspective and is computationally intensive, but very powerful and expressive. Welcome to C. :-)
2nd later edit Now your first question in the comments...as you can see your terminology and perhaps thoughts were a bit garbled. Study the following to get a sense of what's going on. Don't worry about pointers to strings yet. This was compiled with gcc 4.8.2 on Linux 3.13 64-bit with no flags. Note how the excessive use of format specifiers essentially walks backward through the stack, revealing arguments that were passed in a previous function call.
/* Do not compile this at home. */
#include <stdio.h>
int second() {
printf("%08X %08X %08X %08X %08X %08X %08X %08X\n");
}
int first(int a, int b, int c, int d, int e, int f, int g, int h) {
second();
}
int main(int argc, char **argv) {
first(0xDEEDC0DE, 0x1EADBEEF, 0x11BEDEAD, 0xCAFAF000, 0xDAFEBABE, 0xAACEBACE, 0xE1ED1EAA, 0x10F00FAA);
return 0;
}
Two back-to-back runs, stdio output:
1EADBEEF 11BEDEAD CAFAF000 DAFEBABE AACEBACE 75F83520 00400568 88B151C8
1EADBEEF 11BEDEAD CAFAF000 DAFEBABE AACEBACE 8B4CBDC0 00400568 7BB841C8
Interesting question. Here is the assembly output from two test programs: one 32-bit/MSVC, the other 64-bit GCC:
Test program:
/*
* Sample output:
* A
* B: 49, 2, 5.000000
*/
#include <stdio.h>
int main(int argc, char *argv[]) {
printf ("A\n");
printf ("B: %d, %c, %f\n", 0x31, 0x32, 5.0);
return 0;
}
MSVS/32-bit assembly (cl /Fa
):
_DATA SEGMENT
$SG2938 DB 'A', 0aH, 00H
ORG $+1
$SG2939 DB 'B: %d, %c, %f', 0aH, 00H
...
CONST SEGMENT
__real@4014000000000000 DQ 04014000000000000r ; 5
...
push OFFSET $SG2938
call _printf
...
movsd xmm0, QWORD PTR __real@4014000000000000
movsd QWORD PTR [esp], xmm0
push 50 ; 00000032H
push 49 ; 00000031H
push OFFSET $SG2939
call _printf
GCC/64-bit assembly (gcc -S
):
.LC0:
.string "A"
.LC1:
.string "B: %d, %c, %f\n"
...
movl %edi, -4(%rbp) // You'll notice that GCC substitutes "puts()" for "printf()" here
movq %rsi, -16(%rbp)
movl $.LC0, %edi
call puts
...
movl $.LC1, %eax // Also notice the absence of "push": we're passing arguments in registers, instead of on the stack
movsd .LC2(%rip), %xmm0
movl $50, %edx
movl $49, %esi
movq %rax, %rdi
movl $1, %eax
call printf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With