Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where's the code in the Linux kernel the implements open("/proc/self/fd/NUM")?

I always assumed that doing open(/proc/self/fd/NUM, flags) was equivalent to dup(NUM), but apparently this is not the case! For example, if you dup a file descriptor, then set the new fd to non-blocking, this also affects the original file descriptor (because non-blocking state is a property of the file description, and the two file descriptors both point to the same file description). However, if you open /proc/self/fd/NUM, then you seem to get a new independent file description, and can set the non-blocking state of your old and new fds independently. You can even use this to get two file descriptions referring to the same anonymous pipe, which is otherwise impossible (example). On the other hand, while you can dup a socket fd, open("/proc/self/fd/NUM", flags) fails if NUM refers to a socket.

Now I'd like to be able to see how this works for other types of special file, and answer questions like "what permission checking is done when re-opening a file this way?", so I was trying to find the code in Linux that actually implements this path, but when I started reading fs/proc/fd.c I quickly got lost in a maze of twisty operations structs, all different.

So my question is: can anyone explain the code path followed by doing open("/proc/self/fd/NUM", flags)? For concreteness let's say that NUM refers to a pipe and we're talking about the latest kernel release.

like image 855
Nathaniel J. Smith Avatar asked May 31 '26 01:05

Nathaniel J. Smith


1 Answers

A comment suggests a look at proc_fd_link and that's a good idea. If you have trouble following how the code can get there, you can help yourself with systemtap. Here is a magic script:

probe kernel.function("proc_fd_link") {
    print_backtrace();
}

Running it while opening a file from under fd/ gives:

 0xffffffffbb2cad70 : proc_fd_link+0x0/0xd0 [kernel]
 0xffffffffbb2c4c3b : proc_pid_get_link+0x6b/0x90 [kernel] (inexact)
 0xffffffffbb36341a : security_inode_follow_link+0x4a/0x70 [kernel] (inexact)
 0xffffffffbb25bf13 : trailing_symlink+0x1e3/0x220 [kernel] (inexact)
 0xffffffffbb25f559 : path_openat+0xe9/0x1380 [kernel] (inexact)
 0xffffffffbb261af1 : do_filp_open+0x91/0x100 [kernel] (inexact)
 0xffffffffbb26fd8f : __alloc_fd+0x3f/0x170 [kernel] (inexact)
 0xffffffffbb24f280 : do_sys_open+0x130/0x220 [kernel] (inexact)
 0xffffffffbb24f38e : sys_open+0x1e/0x20 [kernel] (inexact)
 0xffffffffbb003c57 : do_syscall_64+0x67/0x160 [kernel] (inexact)
 0xffffffffbb8039e1 : return_from_SYSCALL_64+0x0/0x6a [kernel] (inexact)

In proc_pid_get_link we see:

/* Are we allowed to snoop on the tasks file descriptors? */
if (!proc_fd_access_allowed(inode))
        goto out;

aaaand

/* permission checks */
static int proc_fd_access_allowed(struct inode *inode)
{
        struct task_struct *task;
        int allowed = 0;
        /* Allow access to a task's file descriptors if it is us or we
         * may use ptrace attach to the process and find out that
         * information.
         */
        task = get_proc_task(inode);
        if (task) {
                allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
                put_task_struct(task);
        }
        return allowed;
}

clearly, you need the same perms as if you were attaching with ptrace.

Finally, why does opening a socket fail? strace shows ENXIO being returned. A quick git grep ENXIO fs/*.c reveals:

static int no_open(struct inode *inode, struct file *file)
{
        return -ENXIO;
}

Checking how the code ends up using no_open is left as an exercise for the reader. Also note systemtap can be used for printf-like debugging without modifying the source code. It also can be placed on 'return' from functions and report the error code.