Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does printf() var-arg referencing interact with stack memory layout?

Given the code snippet:

int main()
{
    printf("Val: %d", 5);
    return 0;
}

is there any guarantee that the compiler would store "Val: %d" and '5' contiguously? For example:

+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| ... |  %d | ' ' | ':' | 'l' | 'a' | 'V' | '5' | ... |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      ^                                   ^     ^
      |           Format String           | int |

Exactly how does are these parameters allocated in memory?

Furthermore, does the printf function access the int relative to the format string or by absolute value? So for example, in the data

+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| ... |  %d | ' ' | ':' | 'l' | 'a' | 'V' | '5' | ... |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      ^                                   ^     ^
      |           Format String           | int |

when the function encounters %d would there already be a stored memory address for the first parameter of the function which would be referenced or would the value be calculated relative to the first element of the format string?

Sorry if I'm being confusing, my primary goal is to understand string formatting exploits where the user is allowed to supply the format string as described in this document

http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf

My concerns arise on the attack described on page 3 and 4. I figured that the %x's are to skip the 16 bits that the string takes up which would indicate that the function allocated contiguously and references relatively but other sources indicate that there is not guaranteed that the compiler must allocate contiguously and I was concerned that the paper was a simplification.

like image 826
Mikey G Avatar asked Aug 02 '15 02:08

Mikey G


2 Answers

is there any guarantee that the compiler would store "Val: %d" and '5' contiguously

It's virtually guaranteed they won't be. The 5 is small enough that it can be embedded right in the instruction stream rather than loaded through a memory address (pointer) -- something like movl #5, %eax and/or followed by a push onto the stack -- whereas the string object will be laid out in the read-only data area of the executable image, and will be referenced via a pointer. We're talking about compile time layout of the executable image.

Unless you mean the runtime layout of the stack in which yes, the word-sized pointer to that string, and the word-sized constant 5, will be next to each other. But the order is probably the reverse of what you expect -- study 'C function calling convention'.

[Later edit: Running some code samples with -S (output assembly) now; I'm reminded that with light register usage in the caller (i.e. CPU registers can be overwritten without harm), and few arguments to the called function, the arguments can be passed entirely via registers to save instructions and memory. So the layout of the stack is actually tricky to predict, even if the attacker had access to the source code. Especially with gcc -O2, which collapsed my main -> my_function -> printf function sequence into main -> printf]

Most exploits employ stack overruns, since malicious code runs into a brick wall trying to modify memory in the aforementioned read-only data area -- OS aborts the process.

The behavior of printf is peculiar in that the format string is like a miniature computer program that tells printf to look at arguments on the stack for every '%' format specifier that it finds. If those arguments were never in fact pushed, and/or were of different sizes, printf will blindly traverse portions of the stack it shouldn't and perhaps reveal data further up the stack (down the call chain) where private data may lie. If the first argument to printf is at least a constant, a compiler can at least warn you when subsequent arguments mismatch the '%' specifiers, but when it's a variable, all bets are off.

printf is awful from a security perspective and is computationally intensive, but very powerful and expressive. Welcome to C. :-)

2nd later edit Now your first question in the comments...as you can see your terminology and perhaps thoughts were a bit garbled. Study the following to get a sense of what's going on. Don't worry about pointers to strings yet. This was compiled with gcc 4.8.2 on Linux 3.13 64-bit with no flags. Note how the excessive use of format specifiers essentially walks backward through the stack, revealing arguments that were passed in a previous function call.

/* Do not compile this at home. */
#include <stdio.h>

int second() {
  printf("%08X %08X %08X %08X %08X %08X %08X %08X\n");
}

int first(int a, int b, int c, int d, int e, int f, int g, int h) {
  second();
}

int main(int argc, char **argv) {
  first(0xDEEDC0DE, 0x1EADBEEF, 0x11BEDEAD, 0xCAFAF000, 0xDAFEBABE, 0xAACEBACE, 0xE1ED1EAA, 0x10F00FAA);
  return 0;
}

Two back-to-back runs, stdio output:

1EADBEEF 11BEDEAD CAFAF000 DAFEBABE AACEBACE 75F83520 00400568 88B151C8

1EADBEEF 11BEDEAD CAFAF000 DAFEBABE AACEBACE 8B4CBDC0 00400568 7BB841C8

like image 183
BaseZen Avatar answered Oct 17 '22 08:10

BaseZen


Interesting question. Here is the assembly output from two test programs: one 32-bit/MSVC, the other 64-bit GCC:

Test program:

/*
 * Sample output:
 * A
 * B: 49, 2, 5.000000
 */
#include <stdio.h>

int main(int argc, char *argv[]) {
  printf ("A\n");
  printf ("B: %d, %c, %f\n", 0x31, 0x32, 5.0);
  return 0;
}

MSVS/32-bit assembly (cl /Fa):

_DATA   SEGMENT
$SG2938 DB  'A', 0aH, 00H
    ORG $+1
$SG2939 DB  'B: %d, %c, %f', 0aH, 00H
...
CONST   SEGMENT
__real@4014000000000000 DQ 04014000000000000r   ; 5
...
    push    OFFSET $SG2938
    call    _printf
...
    movsd   xmm0, QWORD PTR __real@4014000000000000
    movsd   QWORD PTR [esp], xmm0
    push    50                  ; 00000032H
    push    49                  ; 00000031H
    push    OFFSET $SG2939
    call    _printf

GCC/64-bit assembly (gcc -S):

.LC0:
        .string "A"
.LC1:
        .string "B: %d, %c, %f\n"
...
        movl    %edi, -4(%rbp)   // You'll notice that GCC substitutes "puts()" for "printf()" here
        movq    %rsi, -16(%rbp)
        movl    $.LC0, %edi
        call    puts
...
        movl    $.LC1, %eax     // Also notice the absence of "push": we're passing arguments in registers, instead of on the stack
        movsd   .LC2(%rip), %xmm0
        movl    $50, %edx
        movl    $49, %esi
        movq    %rax, %rdi
        movl    $1, %eax
        call    printf
like image 29
paulsm4 Avatar answered Oct 17 '22 07:10

paulsm4