I'm currently working on a project where I have a parent process that sets up a socketpair, forks and then uses this socketpair to communicate. The child, if it wants to open a file (or any other file descriptor based resource) should always go to the parent, request the resource and get the fd
sent via the socketpair. Furthermore I want to prevent the child from opening any file descriptor by itself.
I stumbled over setrlimit
which successfully prevents the child from opening new file descriptors, but it also seems to invalidate any file descriptors sent over the initial socket connection. Is there any method on Linux that allows a single process to open any file, send its file descriptor to other processes and lets them use them without allowing these other processes to open any file descriptor by themselves?
For my use case that can be any kernel configuration, system call, etc. as long as it can be applied after fork and as long as it applies to all file descriptors (not just files but also sockets, socketpairs, etc.).
Yes, sockets are also indices into the same table as files. At least for UNIX systems (like Linux and OSX), Windows is different, which is why you can't use e.g. read and write to receive and send data. Each process has its own "file" descriptor table.
File descriptors are generally unique to each process, but they can be shared by child processes created with a fork subroutine or copied by the fcntl, dup, and dup2 subroutines.
Because each file or socket has a unique descriptor, the system knows exactly where to send and to receive the data.
What you have here is exactly the use case of seccomp.
Using seccomp, you can filter syscalls in different ways. What you want to do in this situation is, right after fork()
, to install a seccomp
filter that disallows the use of open(2)
, openat(2)
, socket(2)
(and more).
To accomplish this, you can do the following:
seccomp_init(3)
with the default behavior of SCMP_ACT_ALLOW
.seccomp_rule_add(3)
for each syscall that you want to deny. You can use SCMP_ACT_KILL
to kill the process if the syscall is attempted, SCMP_ACT_ERRNO(val)
to make the syscall fail returning the specified errno
value, or any other action
value defined in the manual page.seccomp_load(3)
to make it effective.Before continuing, NOTE that a blacklist approach like this one is in general weaker than a whitelist approach. It allows any syscall that is not explicitly disallowed, and could result in a bypass of the filter. If you believe that the child process you want to execute could be maliciously trying to avoid the filter, or if you already know which syscalls will be needed by the children, a whitelist approach is better, and you should do the opposite of the above: create filter with the default action of SCMP_ACT_KILL
and allow the needed syscalls with SCMP_ACT_ALLOW
. In terms of code the difference is minimal (the whitelist is probably longer, but the steps are the same).
Here's an example of the above (I'm doing exit(-1)
in case of error just for simplicity's sake):
#include <stdlib.h>
#include <seccomp.h>
static void secure(void) {
int err;
scmp_filter_ctx ctx;
int blacklist[] = {
SCMP_SYS(open),
SCMP_SYS(openat),
SCMP_SYS(creat),
SCMP_SYS(socket),
SCMP_SYS(open_by_handle_at),
// ... possibly more ...
};
// Create a new seccomp context, allowing every syscall by default.
ctx = seccomp_init(SCMP_ACT_ALLOW);
if (ctx == NULL)
exit(-1);
/* Now add a filter for each syscall that you want to disallow.
In this case, we'll use SCMP_ACT_KILL to kill the process if it
attempts to execute the specified syscall. */
for (unsigned i = 0; i < sizeof(blacklist) / sizeof(blacklist[0]); i++) {
err = seccomp_rule_add(ctx, SCMP_ACT_KILL, blacklist[i], 0);
if (err)
exit(-1);
}
// Load the context making it effective.
err = seccomp_load(ctx);
if (err)
exit(-1);
}
Now, in your program, you can call the above function to apply the seccomp filter right after the fork()
, like this:
child_pid = fork();
if (child_pid == -1)
exit(-1);
if (child_pid == 0) {
secure();
// Child code here...
exit(0);
} else {
// Parent code here...
}
A few important notes on seccomp:
fork(2)
or clone(2)
are allowed by the filter, any child processes will be constrained by the same filter.execve(2)
is allowed, the existing filter will be preserved across a call to execve(2)
.prctl(2)
syscall is allowed, the process is able to apply further filters.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With