Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inject shared library into a process

Tags:

c

linux

I just started to learn injection techniques in Linux and want to write a simple program to inject a shared library into a running process. (the library will simply print a string.) However, after a couple of hours research, I couldn't find any complete example. Well, I did figure out I probably need to use ptrace() to pause the process and inject the contents, but not sure how to load the library into the memory space of target process and relocation stuff in C code. Does anyone know any good resources or working examples for shared library injection? (Of course, I know there might be some existing libraries like hotpatch I can use to make injection much easier but that's not what I want)

And if anyone can write some pseudo code or give me an example, I will appreciate it. Thanks.

PS: I am not asking about LD_PRELOAD trick.

like image 774
user1726119 Avatar asked Jun 22 '14 20:06

user1726119


People also ask

Are shared libraries shared between processes?

Shared code is loaded into memory once in the shared library segment and shared by all processes that reference it. The advantages of shared libraries are: Less disk space is used because the shared library code is not included in the executable programs.

What is a shared object injection?

Shared Object InjectionOnce a program is executed, it will seek to load the necessary shared objects. We can use a program called strace to track the shared objects that being called. If a shared object were not found, we can hijack it and write a malicious script to spawn a root shell when it is loaded.

How do Linux shared libraries work?

Shared libraries are the most common way to manage dependencies on Linux systems. These shared resources are loaded into memory before the application starts, and when several processes require the same library, it will be loaded only once on the system. This feature saves on memory usage by the application.

What is the purpose of using shared libraries?

Shared Libraries are the libraries that can be linked to any program at run-time. They provide a means to use code that can be loaded anywhere in the memory. Once loaded, the shared library code can be used by any number of programs.


1 Answers

The "LD_PRELOAD trick" André Puel mentioned in a comment to the original question, is no trick, really. It is the standard method of adding functionality -- or more commonly, interposing existing functionality -- in a dynamically-linked process. It is standard functionality provided by ld.so, the Linux dynamic linker.

The Linux dynamic linker is controlled by environment variables (and configuration files); LD_PRELOAD is simply an environment variable that provides a list of dynamic libraries that should be linked against each process. (You could also add the library to /etc/ld.so.preload, in which case it is automatically loaded for every binary, regardless of the LD_PRELOAD environment variable.)

Here's an example, example.c:

#include <unistd.h>
#include <errno.h>

static void init(void) __attribute__((constructor));

static void wrerr(const char *p)
{
    const char *q;
    int        saved_errno;

    if (!p)
        return;

    q = p;
    while (*q)
        q++;

    if (q == p)
        return;

    saved_errno = errno;

    while (p < q) {
        ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != (ssize_t)-1 || errno != EINTR)
            break;
    }

    errno = saved_errno;
}

static void init(void)
{
    wrerr("I am loaded and running.\n");
}

Compile it to libexample.so using

gcc -Wall -O2 -fPIC -shared example.c -ldl -Wl,-soname,libexample.so -o libexample.so

If you then run any (dynamically linked) binary with the full path to libexample.so listed in LD_PREALOD environment variable, the binary will output "I am loaded and running" to standard output before its normal output. For example,

LD_PRELOAD=$PWD/libexample.so date

will output something like

I am loaded and running.
Mon Jun 23 21:30:00 UTC 2014

Note that the init() function in the example library is automatically executed, because it is marked __attribute__((constructor)); that attribute means the function will be executed prior to main().

My example library may seem funny to you -- no printf() et cetera, wrerr() messing with errno --, but there are very good reasons I wrote it like this.

First, errno is a thread-local variable. If you run some code, initially saving the original errno value, and restoring that value just before returning, the interrupted thread will not see any change in errno. (And because it is thread-local, nobody else will see any change either, unless you try something silly like &errno.) Code that is supposed to run without the rest of the process noticing random effects, better make sure it keeps errno unchanged in this manner!

The wrerr() function itself is a simple function that writes a string safely to standard error. It is async-signal-safe (meaning you can use it in signal handlers, unlike printf() et al.), and other than errno which is kept unchanged, it does not affect the state of the rest of the process in any way. Simply put, it is a safe way to output strings to standard error. It is also simple enough for everbody to understand.

Second, not all processes use standard C I/O. For example, programs compiled in Fortran do not. So, if you try to use standard C I/O, it might work, it might not, or it might even confuse the heck out of the target binary. Using the wrerr() function avoids all that: it will just write the string to standard error, without confusing the rest of the process, no matter what programming language it was written in -- well, as long as that language's runtime does not move or close the standard error file descriptor (STDERR_FILENO == 2).


To load that library dynamically in a running process, you'll need to first attach ptrace to it, then stop it before next entry to a syscall (PTRACE_SYSEMU), to make sure you're somewhere you can safely do the dlopen call.

Check /proc/PID/maps to verify you are within the process' own code, not in shared library code. You can do PTRACE_SYSCALL or PTRACE_SYSEMU to continue to next candidate stopping point. Also, remember to wait() for the child to actually stop after attaching to it, and that you attach to all threads.

While stopped, use PTRACE_GETREGS to get the register state, and PTRACE_PEEKTEXT to copy enough code, so you can replace it with PTRACE_POKETEXT to a position-independent sequence that calls dlopen("/path/to/libexample.so", RTLD_NOW), RTLD_NOW being an integer constant defined for your architecture in /usr/include/.../dlfcn.h, typically 2. Since the pathname is constant string, you can save it (temporarily) over the code; the function call takes a pointer to it, after all.

Have that position-independent sequence you used to rewrite some of the existing code end with a syscall, so that you can run the inserted using PTRACE_SYSCALL (in a loop, until it ends up at that inserted syscall) without having to single-step it. Then you use PTRACE_POKETEXT to revert the code to its original state, and finally PTRACE_SETREGS to revert the program state to what its initial state was.


Consider this trivial program, compiled as say target:

#include <stdio.h>
int main(void)
{
    int c;
    while (EOF != (c = getc(stdin)))
        putc(c, stdout);
    return 0;
}

Let's say we're already running that (pid $(ps -o pid= -C target)), and we wish to inject code that prints "Hello, world!" to standard error.

On x86-64, kernel syscalls are done using the syscall instruction (0F 05 in binary; it's a two-byte instruction). So, to execute any syscall you want on behalf of a target process, you need to replace two bytes. (On x86-64 PTRACE_POKETEXT actually transfers a 64-bit word, preferably aligned on a 64-bit boundary.)

Consider the following program, compiled to say agent:

#define  _GNU_SOURCE
#include <sys/ptrace.h>
#include <sys/user.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    struct user_regs_struct oldregs, regs;
    unsigned long  pid, addr, save[2];
    siginfo_t      info;
    char           dummy;

    if (argc != 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s PID ADDRESS\n", argv[0]);
        fprintf(stderr, "\n");
        return 1;
    }

    if (sscanf(argv[1], " %lu %c", &pid, &dummy) != 1 || pid < 1UL) {
        fprintf(stderr, "%s: Invalid process ID.\n", argv[1]);
        return 1;
    }

    if (sscanf(argv[2], " %lx %c", &addr, &dummy) != 1) {
        fprintf(stderr, "%s: Invalid address.\n", argv[2]);
        return 1;
    }
    if (addr & 7) {
        fprintf(stderr, "%s: Address is not a multiple of 8.\n", argv[2]);
        return 1;
    }

    /* Attach to the target process. */
    if (ptrace(PTRACE_ATTACH, (pid_t)pid, NULL, NULL)) {
        fprintf(stderr, "Cannot attach to process %lu: %s.\n", pid, strerror(errno));
        return 1;
    }

    /* Wait for attaching to complete. */
    waitid(P_PID, (pid_t)pid, &info, WSTOPPED);

    /* Get target process (main thread) register state. */
    if (ptrace(PTRACE_GETREGS, (pid_t)pid, NULL, &oldregs)) {
        fprintf(stderr, "Cannot get register state from process %lu: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Save the 16 bytes at the specified address in the target process. */
    save[0] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 0UL), NULL);
    save[1] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 8UL), NULL);

    /* Replace the 16 bytes with 'syscall' (0F 05), followed by the message string. */
    if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)0x2c6f6c6c6548050fULL) ||
        ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)0x0a21646c726f7720ULL)) {
        fprintf(stderr, "Cannot modify process %lu code: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Modify process registers, to execute the just inserted code. */
    regs = oldregs;
    regs.rip = addr;
    regs.rax = SYS_write;
    regs.rdi = STDERR_FILENO;
    regs.rsi = addr + 2UL;
    regs.rdx = 14; /* 14 bytes of message, no '\0' at end needed. */
    if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, &regs)) {
        fprintf(stderr, "Cannot set register state from process %lu: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Do the syscall. */
    if (ptrace(PTRACE_SINGLESTEP, (pid_t)pid, NULL, NULL)) {
        fprintf(stderr, "Cannot execute injected code to process %lu: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Wait for the client to execute the syscall, and stop. */
    waitid(P_PID, (pid_t)pid, &info, WSTOPPED);

    /* Revert the 16 bytes we modified. */
    if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)save[0]) ||
        ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)save[1])) {
        fprintf(stderr, "Cannot revert process %lu code modifications: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Revert the registers, too, to the old state. */
    if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, &oldregs)) {
        fprintf(stderr, "Cannot reset register state from process %lu: %s.\n", pid, strerror(errno));
        ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
        return 1;
    }

    /* Detach. */
    if (ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL)) {
        fprintf(stderr, "Cannot detach from process %lu: %s.\n", pid, strerror(errno));
        return 1;
    }

    fprintf(stderr, "Done.\n");
    return 0;
}

It takes two parameters: the pid of the target process, and the address to use to replace with the injected executable code.

The two magic constants, 0x2c6f6c6c6548050fULL and 0x0a21646c726f7720ULL, are simply the native representation on x86-64 for the 16 bytes

0F 05 "Hello, world!\n"

with no string-terminating NUL byte. Note that the string is 14 characters long, and starts two bytes after the original address.

On my machine, running cat /proc/$(ps -o pid= -C target)/maps -- which shows the complete address mapping for the target -- shows that target's code is located at 0x400000 .. 0x401000. objdump -d ./target shows that there is no code after 0x4006ef or so. Therefore, addresses 0x400700 to 0x401000 are reserved for executable code, but do not contain any. The address 0x400700 -- on my machine; may very well differ on yours! -- is therefore a very good address for injecting code into target while it is running.

Running ./agent $(ps -o pid= -C target) 0x400700 injects the necessary syscall code and string to the target binary at 0x400700, executes the injected code, and replaces the injected code with original code. Essentially, it accomplishes the desired task: for target to output "Hello, world!" to standard error.

Note that Ubuntu and some other Linux distributions nowadays allow a process to ptrace only their child processes running as the same user. Since target is not a child of agent, you either need to have superuser privileges (run sudo ./agent $(ps -o pid= -C target) 0x400700), or modify target so that it explicitly allows the ptracing (for example, by adding prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY); near the start of the program). See man ptrace and man prctl for details.

Like I explained already above, for longer or more complicated code, use ptrace to cause the target to first execute mmap(NULL, page_aligned_length, PROT_READ | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0), which allocates executable memory for new code. So, on x86-64, you only need to locate one 64-bit word you can replace safely, and then you can PTRACE_POKETEXT the new code for the target to execute. While my example uses the write() syscall, it is a really small change to have it use mmap() or mmap2() syscall instead.

(On x86-64 in Linux, the syscall number is in rax, and parameters in rdi, rsi, rdx, r10, r8, and r9, reading from left to right, respectively; and return value is also in rax.)

Parsing /proc/PID/maps is very useful -- see /proc/PID/maps under man 5 proc. It provides all the pertinent information on the target process address space. To find out whether there are useful unused code areas, parse objdump -wh /proc/$(ps -o pid= -C target)/exe output; it examines the actual binary of the target process directly. (In fact, you could easily find how much unused code there is at the end of the code mapping, and use that automatically.)

Further questions?

like image 165
Nominal Animal Avatar answered Oct 06 '22 05:10

Nominal Animal