I just started to learn injection techniques in Linux and want to write a simple program to inject a shared library into a running process. (the library will simply print a string.) However, after a couple of hours research, I couldn't find any complete example. Well, I did figure out I probably need to use ptrace() to pause the process and inject the contents, but not sure how to load the library into the memory space of target process and relocation stuff in C code. Does anyone know any good resources or working examples for shared library injection? (Of course, I know there might be some existing libraries like hotpatch I can use to make injection much easier but that's not what I want)
And if anyone can write some pseudo code or give me an example, I will appreciate it. Thanks.
PS: I am not asking about LD_PRELOAD trick.
Shared code is loaded into memory once in the shared library segment and shared by all processes that reference it. The advantages of shared libraries are: Less disk space is used because the shared library code is not included in the executable programs.
Shared Object InjectionOnce a program is executed, it will seek to load the necessary shared objects. We can use a program called strace to track the shared objects that being called. If a shared object were not found, we can hijack it and write a malicious script to spawn a root shell when it is loaded.
Shared libraries are the most common way to manage dependencies on Linux systems. These shared resources are loaded into memory before the application starts, and when several processes require the same library, it will be loaded only once on the system. This feature saves on memory usage by the application.
Shared Libraries are the libraries that can be linked to any program at run-time. They provide a means to use code that can be loaded anywhere in the memory. Once loaded, the shared library code can be used by any number of programs.
The "LD_PRELOAD trick" André Puel mentioned in a comment to the original question, is no trick, really. It is the standard method of adding functionality -- or more commonly, interposing existing functionality -- in a dynamically-linked process. It is standard functionality provided by ld.so
, the Linux dynamic linker.
The Linux dynamic linker is controlled by environment variables (and configuration files); LD_PRELOAD
is simply an environment variable that provides a list of dynamic libraries that should be linked against each process. (You could also add the library to /etc/ld.so.preload
, in which case it is automatically loaded for every binary, regardless of the LD_PRELOAD
environment variable.)
Here's an example, example.c:
#include <unistd.h>
#include <errno.h>
static void init(void) __attribute__((constructor));
static void wrerr(const char *p)
{
const char *q;
int saved_errno;
if (!p)
return;
q = p;
while (*q)
q++;
if (q == p)
return;
saved_errno = errno;
while (p < q) {
ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != (ssize_t)-1 || errno != EINTR)
break;
}
errno = saved_errno;
}
static void init(void)
{
wrerr("I am loaded and running.\n");
}
Compile it to libexample.so
using
gcc -Wall -O2 -fPIC -shared example.c -ldl -Wl,-soname,libexample.so -o libexample.so
If you then run any (dynamically linked) binary with the full path to libexample.so
listed in LD_PREALOD
environment variable, the binary will output "I am loaded and running" to standard output before its normal output. For example,
LD_PRELOAD=$PWD/libexample.so date
will output something like
I am loaded and running.
Mon Jun 23 21:30:00 UTC 2014
Note that the init()
function in the example library is automatically executed, because it is marked __attribute__((constructor))
; that attribute means the function will be executed prior to main()
.
My example library may seem funny to you -- no printf()
et cetera, wrerr()
messing with errno
--, but there are very good reasons I wrote it like this.
First, errno
is a thread-local variable. If you run some code, initially saving the original errno
value, and restoring that value just before returning, the interrupted thread will not see any change in errno
. (And because it is thread-local, nobody else will see any change either, unless you try something silly like &errno
.) Code that is supposed to run without the rest of the process noticing random effects, better make sure it keeps errno
unchanged in this manner!
The wrerr()
function itself is a simple function that writes a string safely to standard error. It is async-signal-safe (meaning you can use it in signal handlers, unlike printf()
et al.), and other than errno
which is kept unchanged, it does not affect the state of the rest of the process in any way. Simply put, it is a safe way to output strings to standard error. It is also simple enough for everbody to understand.
Second, not all processes use standard C I/O. For example, programs compiled in Fortran do not. So, if you try to use standard C I/O, it might work, it might not, or it might even confuse the heck out of the target binary. Using the wrerr()
function avoids all that: it will just write the string to standard error, without confusing the rest of the process, no matter what programming language it was written in -- well, as long as that language's runtime does not move or close the standard error file descriptor (STDERR_FILENO == 2
).
To load that library dynamically in a running process, you'll need to first attach ptrace
to it, then stop it before next entry to a syscall (PTRACE_SYSEMU
), to make sure you're somewhere you can safely do the dlopen call.
Check /proc/PID/maps
to verify you are within the process' own code, not in shared library code. You can do PTRACE_SYSCALL
or PTRACE_SYSEMU
to continue to next candidate stopping point. Also, remember to wait()
for the child to actually stop after attaching to it, and that you attach to all threads.
While stopped, use PTRACE_GETREGS
to get the register state, and PTRACE_PEEKTEXT
to copy enough code, so you can replace it with PTRACE_POKETEXT
to a position-independent sequence that calls dlopen("/path/to/libexample.so", RTLD_NOW)
, RTLD_NOW
being an integer constant defined for your architecture in /usr/include/.../dlfcn.h
, typically 2. Since the pathname is constant string, you can save it (temporarily) over the code; the function call takes a pointer to it, after all.
Have that position-independent sequence you used to rewrite some of the existing code end with a syscall, so that you can run the inserted using PTRACE_SYSCALL
(in a loop, until it ends up at that inserted syscall) without having to single-step it. Then you use PTRACE_POKETEXT
to revert the code to its original state, and finally PTRACE_SETREGS
to revert the program state to what its initial state was.
Consider this trivial program, compiled as say target
:
#include <stdio.h>
int main(void)
{
int c;
while (EOF != (c = getc(stdin)))
putc(c, stdout);
return 0;
}
Let's say we're already running that (pid $(ps -o pid= -C target)
), and we wish to inject code that prints "Hello, world!" to standard error.
On x86-64, kernel syscalls are done using the syscall
instruction (0F 05
in binary; it's a two-byte instruction). So, to execute any syscall you want on behalf of a target process, you need to replace two bytes. (On x86-64 PTRACE_POKETEXT actually transfers a 64-bit word, preferably aligned on a 64-bit boundary.)
Consider the following program, compiled to say agent
:
#define _GNU_SOURCE
#include <sys/ptrace.h>
#include <sys/user.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
struct user_regs_struct oldregs, regs;
unsigned long pid, addr, save[2];
siginfo_t info;
char dummy;
if (argc != 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s PID ADDRESS\n", argv[0]);
fprintf(stderr, "\n");
return 1;
}
if (sscanf(argv[1], " %lu %c", &pid, &dummy) != 1 || pid < 1UL) {
fprintf(stderr, "%s: Invalid process ID.\n", argv[1]);
return 1;
}
if (sscanf(argv[2], " %lx %c", &addr, &dummy) != 1) {
fprintf(stderr, "%s: Invalid address.\n", argv[2]);
return 1;
}
if (addr & 7) {
fprintf(stderr, "%s: Address is not a multiple of 8.\n", argv[2]);
return 1;
}
/* Attach to the target process. */
if (ptrace(PTRACE_ATTACH, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot attach to process %lu: %s.\n", pid, strerror(errno));
return 1;
}
/* Wait for attaching to complete. */
waitid(P_PID, (pid_t)pid, &info, WSTOPPED);
/* Get target process (main thread) register state. */
if (ptrace(PTRACE_GETREGS, (pid_t)pid, NULL, &oldregs)) {
fprintf(stderr, "Cannot get register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Save the 16 bytes at the specified address in the target process. */
save[0] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 0UL), NULL);
save[1] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 8UL), NULL);
/* Replace the 16 bytes with 'syscall' (0F 05), followed by the message string. */
if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)0x2c6f6c6c6548050fULL) ||
ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)0x0a21646c726f7720ULL)) {
fprintf(stderr, "Cannot modify process %lu code: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Modify process registers, to execute the just inserted code. */
regs = oldregs;
regs.rip = addr;
regs.rax = SYS_write;
regs.rdi = STDERR_FILENO;
regs.rsi = addr + 2UL;
regs.rdx = 14; /* 14 bytes of message, no '\0' at end needed. */
if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, ®s)) {
fprintf(stderr, "Cannot set register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Do the syscall. */
if (ptrace(PTRACE_SINGLESTEP, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot execute injected code to process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Wait for the client to execute the syscall, and stop. */
waitid(P_PID, (pid_t)pid, &info, WSTOPPED);
/* Revert the 16 bytes we modified. */
if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)save[0]) ||
ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)save[1])) {
fprintf(stderr, "Cannot revert process %lu code modifications: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Revert the registers, too, to the old state. */
if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, &oldregs)) {
fprintf(stderr, "Cannot reset register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Detach. */
if (ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot detach from process %lu: %s.\n", pid, strerror(errno));
return 1;
}
fprintf(stderr, "Done.\n");
return 0;
}
It takes two parameters: the pid of the target process, and the address to use to replace with the injected executable code.
The two magic constants, 0x2c6f6c6c6548050fULL
and 0x0a21646c726f7720ULL
, are simply the native representation on x86-64 for the 16 bytes
0F 05 "Hello, world!\n"
with no string-terminating NUL byte. Note that the string is 14 characters long, and starts two bytes after the original address.
On my machine, running cat /proc/$(ps -o pid= -C target)/maps
-- which shows the complete address mapping for the target -- shows that target's code is located at 0x400000 .. 0x401000. objdump -d ./target
shows that there is no code after 0x4006ef or so. Therefore, addresses 0x400700 to 0x401000 are reserved for executable code, but do not contain any. The address 0x400700 -- on my machine; may very well differ on yours! -- is therefore a very good address for injecting code into target while it is running.
Running ./agent $(ps -o pid= -C target) 0x400700
injects the necessary syscall code and string to the target binary at 0x400700, executes the injected code, and replaces the injected code with original code. Essentially, it accomplishes the desired task: for target to output "Hello, world!" to standard error.
Note that Ubuntu and some other Linux distributions nowadays allow a process to ptrace only their child processes running as the same user. Since target is not a child of agent, you either need to have superuser privileges (run sudo ./agent $(ps -o pid= -C target) 0x400700
), or modify target so that it explicitly allows the ptracing (for example, by adding prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY);
near the start of the program). See man ptrace and man prctl for details.
Like I explained already above, for longer or more complicated code, use ptrace to cause the target to first execute mmap(NULL, page_aligned_length, PROT_READ | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0)
, which allocates executable memory for new code. So, on x86-64, you only need to locate one 64-bit word you can replace safely, and then you can PTRACE_POKETEXT the new code for the target to execute. While my example uses the write() syscall, it is a really small change to have it use mmap() or mmap2() syscall instead.
(On x86-64 in Linux, the syscall number is in rax, and parameters in rdi, rsi, rdx, r10, r8, and r9, reading from left to right, respectively; and return value is also in rax.)
Parsing /proc/PID/maps
is very useful -- see /proc/PID/maps under man 5 proc. It provides all the pertinent information on the target process address space. To find out whether there are useful unused code areas, parse objdump -wh /proc/$(ps -o pid= -C target)/exe
output; it examines the actual binary of the target process directly. (In fact, you could easily find how much unused code there is at the end of the code mapping, and use that automatically.)
Further questions?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With