The C code works fine and correctly enters the namespace, but the Go code always seems to return EINVAL from the setns
call to enter the mnt
namespace. I've tried a number of permutations (including embedded C code with cgo and external .so
) on Go 1.2
, 1.3
and the current tip.
Stepping through the code in gdb
shows that both sequences are calling setns
in libc
the exact same way (or so it appears to me).
I have boiled what seems to be the issue down to the code below. What am I doing wrong?
I have a shell alias for starting quick busybox containers:
alias startbb='docker inspect --format "{{ .State.Pid }}" $(docker run -d busybox sleep 1000000)'
After running this, startbb
will start a container and output it's PID.
lxc-checkconfig
outputs:
Found kernel config file /boot/config-3.8.0-44-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled
--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: missing
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled
uname -a
produces:
Linux gecko 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The following C code works fine:
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
main(int argc, char* argv[]) {
int i;
char nspath[1024];
char *namespaces[] = { "ipc", "uts", "net", "pid", "mnt" };
if (geteuid()) { fprintf(stderr, "%s\n", "abort: you want to run this as root"); exit(1); }
if (argc != 2) { fprintf(stderr, "%s\n", "abort: you must provide a PID as the sole argument"); exit(2); }
for (i=0; i<5; i++) {
sprintf(nspath, "/proc/%s/ns/%s", argv[1], namespaces[i]);
int fd = open(nspath, O_RDONLY);
if (setns(fd, 0) == -1) {
fprintf(stderr, "setns on %s namespace failed: %s\n", namespaces[i], strerror(errno));
} else {
fprintf(stdout, "setns on %s namespace succeeded\n", namespaces[i]);
}
close(fd);
}
}
After compiling with gcc -o checkns checkns.c
, the output of sudo ./checkns <PID>
is:
setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace succeeded
Conversely, the following Go code (which should be identical) doesn't work quite as well:
package main
import (
"fmt"
"os"
"path/filepath"
"syscall"
)
func main() {
if syscall.Geteuid() != 0 {
fmt.Println("abort: you want to run this as root")
os.Exit(1)
}
if len(os.Args) != 2 {
fmt.Println("abort: you must provide a PID as the sole argument")
os.Exit(2)
}
namespaces := []string{"ipc", "uts", "net", "pid", "mnt"}
for i := range namespaces {
fd, _ := syscall.Open(filepath.Join("/proc", os.Args[1], "ns", namespaces[i]), syscall.O_RDONLY, 0644)
err, _, msg := syscall.RawSyscall(308, uintptr(fd), 0, 0) // 308 == setns
if err != 0 {
fmt.Println("setns on", namespaces[i], "namespace failed:", msg)
} else {
fmt.Println("setns on", namespaces[i], "namespace succeeded")
}
}
}
Instead, running sudo go run main.go <PID>
produces:
setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace failed: invalid argument
(There is an issue filed on the Go project)
So, the answer to this question is that you have to call setns
from a single-threaded context. This makes sense since setns
should join the current thread to the namespace. Since Go is multi-threaded, you need to make the setns
call before the Go runtime threads start.
I think this is because the thread in which the call to syscall.RawSyscall
executes is not the main thread -- even with runtime.LockOSThread
the result is not what you would expect (ie. that the goroutine is "locked" to the main C thread and therefore equivalent to the constructor trick explained below).
The reply I got after filing the issue suggested using "the cgo
constructor trick". I couldn't find any "proper" documentation on this "trick", but it is used in nsinit
by Docker/Michael Crosby and even though I went over that code line by line, I didn't try running it this way (see below for frustration).
The "trick" is basically that you can get cgo
to execute a C function prior to starting the Go runtime.
To do this, you add the __attribute__((constructor))
macro to decorate the function you want to run before Go starts up:
/*
__attribute__((constructor)) void init() {
// this code will execute before Go starts up
// in runs in a single-threaded C context
// before Go's threads start running
}
*/
import "C"
Using this as a template, I modified checkns.go
like this:
/*
#include <sched.h>
#include <stdio.h>
#include <fcntl.h>
__attribute__((constructor)) void enter_namespace(void) {
setns(open("/proc/<PID>/ns/mnt", O_RDONLY, 0644), 0);
}
*/
import "C"
... rest of file is unchanged ...
This code works, but requires the PID
to be hardcoded since it's not being read properly from the commandline input, but it illustrates the idea (and works if you provide a PID
from a container started as described above).
It's frustrating because I wanted call setns
multiple times but since this C code executes before the Go runtime starts, there is no Go code available.
Update: Shlepping around in the kernel mailing lists provides this link to a conversation that documents this. I can't seem to find it in any actually published manpages, but here's the quote from a patch to setns(2)
, confirmed by Eric Biederman:
A process may not be reassociated with a new mount namespace if it is multi-threaded. Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With