How do sites like codepad.org and ideone.com sandbox your program?

Question

I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. How can I sandbox these programs so that malicious users don't take down my server?

Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious.

I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox.

Angus · Accepted Answer

codepad.org has something based on geordi, which runs everything in a chroot (i.e restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. See http://codepad.org/about .

I've previously used Systrace, another utility for restricting system calls.

If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. Although that would provide another layer of protection, which probably wouldn't hurt.

thkala · Answer

Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:

Isolation and access control granularity
Performance and ease of installation/configuration

I eventually decided on a multi-tiered architecture, based on Linux:

Level 0 - Virtualization:

By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
- Clear separation of sensitive from non-sensitive data.
- At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
- A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
- Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
  
  For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:

On a Unix OS that would contain the traditional access and resource control mechanisms:
- Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail.
- Strict user permissions, possibly with ACLs.
- ulimit resource limits on processor time and memory usage.
- Execution under nice to reduce priority over more critical processes. On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems.
- Disk quotas.
- Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:

On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:

I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:

Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
- Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
- Selective access to system libraries and header files; if you don't want your users to call connect() you might just restrict access to socket.h.
- Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:

You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
- calling whatever security officials are in charge of such issues.
- finding that persistent little hacker of yours and offering them a job.

The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.

liuyu · Answer

I am the developer of libsandbox mentioned by @thkala, and I do recommend it for use in your project.

Some additional comments on @thkala's answer,

it is fair to classify libsandbox as a user-land tool, but libsandbox does integrate standard OS-level security mechanisms (i.e. chroot, setuid, and resource quota);
restricting access to C/C++ headers, or static analysis of users' code, does NOT prevent system functions like connect() from being called. This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions in libc;
compile-time protection also deserves attention because malicious C/C++ code can exhaust your CPU with infinite template recursion or pre-processing macro expansion;

How do sites like codepad.org and ideone.com sandbox your program?

Tags:

language-agnostic

operating-system

sandbox

system-calls

mpen

3 Answers

Angus

thkala

liuyu

Recent Activity

Donate For Us

How do sites like codepad.org and ideone.com sandbox your program?

Tags:

language-agnostic

operating-system

sandbox

system-calls

mpen

3 Answers

Angus

thkala

liuyu

Related questions

Recent Activity

Donate For Us