I am writing a computationally-heavy code for a server (in C/C++). In the inside loops, I need to call some external user functions, millions of times, so they have to run natively fast and their invocation should have no more overhead than a C function call. Each time I receive a user function, in source form, I will automatically compile it into binary and it will be dynamically linked by the main code.
Those functions will only be used as simple Math kernels, e.g. in a peudo-C:
Function f(double x) ->double {
return x * x;
}
or with array access:
Function f(double* ar, int length) ->double {
double sum = 0;
for(i = 0 to length) {
sum = sum + ar[i];
}
return sum;
}
or with basic math library calls:
Function f(double x) ->double {
return cos(x);
}
However, they have to be safe for the server. It's OK if they halt (Turing completeness), but not if they access process memory that is not their own, if they do system calls, if they cause stack overflow, or to generalize, it's unwanted for the external code to "be able to hack the server code".
So my question: I'm wandering if there is a safe-by-design language with an LLVM frontend, (with no pointers etc., with bound checking for arrays/stack, isolation of system calls), with no speed penalties (referring to supervisors, garbage collectors), that I can use. LLVM is not necessary, but it's preferred.
I had a look at Mozillas "Rust" but it doesn't seem to be safe enough [rust-dev].
If there is no such language my fallback option right now is to use a NodeJS Sandboxed VM.
I believe that such a language, if made simple, is feasible but does it exist?
The type of language doesn't matter. A toy language with simplistic design and easy to prove safety would do.
EDIT: Concerning the system calls and harmful dependencies, for any language, it should be easy enough to isolate them with plain bash. Just try to link the produced .bc with no libraries. If it fails, the .bc has dependencies, so drop it. Since LLVM IR are otherwise totally harmless, the only thing that should be guaranteed by the language is memory access.
Uses of sandboxes Sandboxing is an important feature of the Java programming language and development environment, where the sandbox is a program area and set of rules that programmers need to use when creating Java code -- called an applet -- that is sent as part of a webpage.
LLVM currently supports compiling of Ada, C, C++, D, Delphi, Fortran, Haskell, Julia, Objective-C, Rust, and Swift using various front ends. Widespread interest in LLVM has led to several efforts to develop new front ends for a variety of languages.
LLVM helps build new computer languages and improve existing languages. It automates many of the difficult and unpleasant tasks involved in language creation, such as porting the outputted code to multiple platforms and architectures.
LLVM IR is a low-level intermediate representation used by the LLVM compiler framework. You can think of LLVM IR as a platform-independent assembly language with an infinite number of function local registers.
I would really like to add a comment, however Stack-Overflow is preventing me. So I'll just add it as an answer. Perhaps it will be useful.
You might try looking at https://github.com/andoma/vmir. I have been working with it a bit with the hopes of sandboxing arbitrary c++/swift code. I think, it might be possible to create a "safe" interpreter/JIT.
You can control all functions which are called. You can control how memory is accessed. So... Basically, I think, (and am hoping), that I can modify the JIT and interpreter enough so that I can reject code which is inherently not safe, and put up memory boundaries/function restrictions.
Having distinct processes ala PNaCL is the obvious sandboxing choice, but the overhead is substantial. I believe the sandboxing is done process wise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With