Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find CALL and RET number using ptrace?

I'm trying to dynamically find the number of function called and returned of a program at runtime in x86_64 (intel syntax).

To do it I'm using ptrace (without the PTRACE_SYSCALL), and I'm checking RIP regiseter (wich contain the next instruction address) and I'm checking his opcode. I know that a function CALL can be found if LSB is equal to 0xE8 (according to Intel documentation, or http://icube-avr.unistra.fr/fr/images/4/41/253666.pdf page 105).

I found each instruction on http://ref.x86asm.net/coder64.html, So in my program, each time I found 0xE8, 0x9A, 0xF1, etc... I found a function entry (CALL or INT instruction), and if it's a 0xC2, 0XC3, etc... it's a function leave (RET instruction).

The goal is to find it on every program at runtime, I can't have access to the test program's compilation, instrumentation or use gcc's magic tools.

I made a little program who can be compiled with gcc -Wall -Wextra your_file.c and be launched by typing ./a.out a_program.

Here is my code:

#include <sys/ptrace.h>
#include <sys/signal.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

typedef struct user_regs_struct    reg_t;

static int8_t       increase(pid_t pid, int32_t *status)
{
        if (WIFEXITED(*status) || WIFSIGNALED(*status))
                return (-1);
        if (WIFSTOPPED(*status) && (WSTOPSIG(*status) == SIGINT))
                return (-1);
        if (ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL) == -1)
                return (-1);
        return (0);
}

int                 main(int argc, char *argv[])
{
    size_t          pid = fork();
    long            address_rip;
    uint16_t        call = 0;
    uint16_t        ret = 0;
    int32_t         status;
    reg_t           regs;

    if (!pid) {
            if ((status = ptrace(PTRACE_TRACEME, 0, NULL, NULL)) == -1)
                    return (1);
            kill(getpid(), SIGSTOP);
            execvp(argv[1], argv + 1);
    } else {
            while (42) {
                    waitpid(pid, &status, 0);
                    ptrace(PTRACE_GETREGS, pid, NULL, &regs);
                    address_rip = ptrace(PTRACE_PEEKDATA, pid, regs.rip, NULL);
                    address_rip &= 0xFFFF;
                    if ((address_rip & 0x00FF) == 0xC2 || (address_rip & 0x00FF) == 0xC3 ||
                        (address_rip & 0x00FF) == 0xCA || (address_rip & 0x00FF) == 0xCB ||
                        (address_rip & 0x00FF) == 0xCF)
                            ret += 1;
                    else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
                             (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
                             (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF)
                            call += 1;
                    if (increase(pid, &status) == -1) {
                            printf("call: %i\tret: %i\n", call, ret);
                            return (0);
                    }
            }
    }
    return (0);
}

When i ran it with a_program (it's a custom program who simply enter in somes local function and do somes write syscall, the goal is just to trace the number of entered / left function of this program), No error occur, it's work fine, BUT I don't have the same number of CALL and RET. exemple:

user> ./a.out basic_program

call: 636 ret: 651

(The large number of call and ret is caused by LibC who goes into a lot of function before start your program, see Parsing Call and Ret with ptrace.)

Actually, it's like my program goes into more return than function call, but I found that 0xFF instruction is used for CALL or CALLF in (r/m64 or r/m16/m32), but also for other instruction like DEC, INC or JMP (who are very common instruction).

So, how can i differentiate it? according to http://ref.x86asm.net/coder64.html with the "opcode fields", but how can i found it?

If I add 0xFF into my condition:

else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
         (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
         (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF ||
         (address_rip & 0x00FF) == 0xFF)
                call += 1;

If i launch it:

user> ./a.out basic_program

call: 1152 ret: 651

It seems normal for me, because it's count each JMP, DEC or INC, so I need to make a distinction between each 0xFF instruction. I tried to do like that:

 else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
         (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
         (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF ||
         ((address_rip & 0x00FF) == 0xFF && ((address_rip & 0x0F00) == 0X02 ||
         (address_rip & 0X0F00) == 0X03)))
                call += 1;

But it gave me the same result. Am I wrong somewhere? How can I find the same number of call and ret?

like image 376
Volonté du Peuple Avatar asked May 04 '18 16:05

Volonté du Peuple


1 Answers

Here is an example for how to program this. Note that as an x86 instruction can be up to 16 bytes long, 16 bytes must be peeked to be sure to get a complete instruction. As each peek reads 8 bytes, this means that you need to peek twice, once at regs.rip and once 8 byte later:

peek1 = ptrace(PTRACE_PEEKDATA, pid, regs.rip, NULL);
peek2 = ptrace(PTRACE_PEEKDATA, pid, regs.rip + sizeof(long), NULL);

Note that this code glosses over a lot of details about how prefixes are handled and detects a bunch of invalid instructions as function calls. Note further that the code needs to be changed to also incorporate some more CALL instructions and to remove the detection of REX prefixes if you want to use it for 32 bit code:

int iscall(long peek1, long peek2)
{
        union {
                long longs[2];
                unsigned char bytes[16];
        } data;

        int opcode, reg; 
        size_t offset;

        /* turn peeked longs into bytes */
        data.longs[0] = peek1;
        data.longs[1] = peek2;

        /* ignore relevant prefixes */
        for (offset = 0; offset < sizeof data.bytes &&
            ((data.bytes[offset] & 0xe7) == 0x26 /* cs, ds, ss, es override */
            || (data.bytes[offset] & 0xfc) == 0x64 /* fs, gs, addr32, data16 override */
            || (data.bytes[offset] & 0xf0) == 0x40); /* REX prefix */
            offset++)
                ;

        /* instruction is composed of all prefixes */
        if (offset > 15)
                return (0);

        opcode = data.bytes[offset];


        /* E8: CALL NEAR rel32? */
        if (opcode == 0xe8)
                return (1);

        /* sufficient space for modr/m byte? */
        if (offset > 14)
                return (0);

        reg = data.bytes[offset + 1] & 0070; /* modr/m byte, reg field */

        if (opcode == 0xff) {
                /* FF /2: CALL NEAR r/m64? */
                if (reg == 0020)
                        return (1);

                /* FF /3: CALL FAR r/m32 or r/m64? */
                if (reg == 0030)
                        return (1);
        }

        /* not a CALL instruction */
        return (0);
}
like image 159
fuz Avatar answered Oct 18 '22 08:10

fuz