Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling setns from Go returns EINVAL for mnt namespace

The C code works fine and correctly enters the namespace, but the Go code always seems to return EINVAL from the setns call to enter the mnt namespace. I've tried a number of permutations (including embedded C code with cgo and external .so) on Go 1.2, 1.3 and the current tip.

Stepping through the code in gdb shows that both sequences are calling setns in libc the exact same way (or so it appears to me).

I have boiled what seems to be the issue down to the code below. What am I doing wrong?

Setup

I have a shell alias for starting quick busybox containers:

alias startbb='docker inspect --format "{{ .State.Pid }}" $(docker run -d busybox sleep 1000000)'

After running this, startbb will start a container and output it's PID.

lxc-checkconfig outputs:

Found kernel config file /boot/config-3.8.0-44-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: missing
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

uname -a produces:

Linux gecko 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Working C code

The following C code works fine:

#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>

main(int argc, char* argv[]) {
    int i;
    char nspath[1024];
    char *namespaces[] = { "ipc", "uts", "net", "pid", "mnt" };

    if (geteuid()) { fprintf(stderr, "%s\n", "abort: you want to run this as root"); exit(1); }

    if (argc != 2) { fprintf(stderr, "%s\n", "abort: you must provide a PID as the sole argument"); exit(2); }

    for (i=0; i<5; i++) {
        sprintf(nspath, "/proc/%s/ns/%s", argv[1], namespaces[i]);
        int fd = open(nspath, O_RDONLY);

        if (setns(fd, 0) == -1) { 
            fprintf(stderr, "setns on %s namespace failed: %s\n", namespaces[i], strerror(errno));
        } else {
            fprintf(stdout, "setns on %s namespace succeeded\n", namespaces[i]);
        }

        close(fd);
    }
}

After compiling with gcc -o checkns checkns.c, the output of sudo ./checkns <PID> is:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace succeeded

Failing Go code

Conversely, the following Go code (which should be identical) doesn't work quite as well:

package main

import (
    "fmt"
    "os"
    "path/filepath"
    "syscall"
)

func main() {
    if syscall.Geteuid() != 0 {
        fmt.Println("abort: you want to run this as root")
        os.Exit(1)
    }

    if len(os.Args) != 2 {
        fmt.Println("abort: you must provide a PID as the sole argument")
        os.Exit(2)
    }

    namespaces := []string{"ipc", "uts", "net", "pid", "mnt"}

    for i := range namespaces {
        fd, _ := syscall.Open(filepath.Join("/proc", os.Args[1], "ns", namespaces[i]), syscall.O_RDONLY, 0644)
        err, _, msg := syscall.RawSyscall(308, uintptr(fd), 0, 0) // 308 == setns

        if err != 0 {
            fmt.Println("setns on", namespaces[i], "namespace failed:", msg)
        } else {
            fmt.Println("setns on", namespaces[i], "namespace succeeded")
        }

    }
}

Instead, running sudo go run main.go <PID> produces:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace failed: invalid argument
like image 476
Iain Lowe Avatar asked Sep 06 '14 20:09

Iain Lowe


1 Answers

(There is an issue filed on the Go project)

So, the answer to this question is that you have to call setns from a single-threaded context. This makes sense since setns should join the current thread to the namespace. Since Go is multi-threaded, you need to make the setns call before the Go runtime threads start.

I think this is because the thread in which the call to syscall.RawSyscall executes is not the main thread -- even with runtime.LockOSThread the result is not what you would expect (ie. that the goroutine is "locked" to the main C thread and therefore equivalent to the constructor trick explained below).

The reply I got after filing the issue suggested using "the cgo constructor trick". I couldn't find any "proper" documentation on this "trick", but it is used in nsinit by Docker/Michael Crosby and even though I went over that code line by line, I didn't try running it this way (see below for frustration).

The "trick" is basically that you can get cgo to execute a C function prior to starting the Go runtime.

To do this, you add the __attribute__((constructor)) macro to decorate the function you want to run before Go starts up:

/*
__attribute__((constructor)) void init() {
    // this code will execute before Go starts up
    // in runs in a single-threaded C context
    // before Go's threads start running
}
*/
import "C"

Using this as a template, I modified checkns.go like this:

/*
#include <sched.h>
#include <stdio.h>
#include <fcntl.h>

__attribute__((constructor)) void enter_namespace(void) {
   setns(open("/proc/<PID>/ns/mnt", O_RDONLY, 0644), 0);
}
*/
import "C"

... rest of file is unchanged ...

This code works, but requires the PID to be hardcoded since it's not being read properly from the commandline input, but it illustrates the idea (and works if you provide a PID from a container started as described above).

It's frustrating because I wanted call setns multiple times but since this C code executes before the Go runtime starts, there is no Go code available.

Update: Shlepping around in the kernel mailing lists provides this link to a conversation that documents this. I can't seem to find it in any actually published manpages, but here's the quote from a patch to setns(2), confirmed by Eric Biederman:

A process may not be reassociated with a new mount namespace if it is multi-threaded. Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

like image 132
Iain Lowe Avatar answered Sep 20 '22 00:09

Iain Lowe