Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What C-integration problems arise with stackless VM implementations?

By stackless VM I mean implementation which maintains its own stack on the heap instead of using system "C-stack". This has a lot of advantages like continuations and serializable state, but also has some disadvantages when it comes to C-bindings, especially to C-VM-C kind of callbacks (or VM-C-VM).

The question is what exactly these disadvantages are? Could anyone give a good example of a real problem?

like image 521
Oleg Andreev Avatar asked Apr 30 '09 09:04

Oleg Andreev


2 Answers

It sounds like you're already familiar with some of the disadvantages and the advantages.

Some others: a) Makes it possible to support proper tail call optimization even if the underlying implementation does not have any support for it b) Easier to construct things like a language level "stack trace" c) Easier to add proper continuations, as you pointed out

I recently wrote a simple "Scheme" interpreter in C#, which initially used the .NET stack. I then re-wrote it to use an explicit stack - so perhaps the following will help you:

The first version used the implicit .NET runtime stack...

Initially, it was just a class hierarchy, with different forms (Lambda, Let, etc.) being implementations of the following interface:

// A "form" is an expression that can be evaluted with
// respect to an environment
// e.g.
// "(* x 3)"
// "x"
// "3"
public interface IForm
{
    object Evaluate(IEnvironment environment);
}

IEnvironment looked as you'd expect:

/// <summary>
/// Fundamental interface for resolving "symbols" subject to scoping.
/// </summary>
public interface IEnvironment
{
    object Lookup(string name);
    IEnvironment Extend(string name, object value);
}

For adding "builtins" to my Scheme interpreter, I initially had the following interface:

/// <summary>
/// A function is either a builtin function (i.e. implemented directly in CSharp)
/// or something that's been created by the Lambda form.
/// </summary>
public interface IFunction
{
    object Invoke(object[] args);
}

That was when it used the implicit .NET runtime stack. There was definitely less code, but it was impossible to add things like proper tail recursion, and most importantly, it was awkward for my interpreter to be able to provide a "language level" stack trace in the case of a runtime error.

So I rewrote it to have an explicit (heap allocated) stack.

My "IFunction" interface had to change to the following, so that I could implement things like "map" and "apply", which call back into the Scheme interpreter:

/// <summary>
/// A function that wishes to use the thread state to
/// evaluate its arguments. The function should either:
/// a) Push tasks on to threadState.Pending which, when evaluated, will
///   result in the result being placed on to threadState.Results
/// b) Push its result directly on to threadState.Results
/// </summary>
public interface IStackFunction
{
    void Evaluate(IThreadState threadState, object[] args);
}

And IForm changed to:

public interface IForm
{
    void Evaluate(IEnvironment environment, IThreadState s);
}

Where IThreadState is as follows:

/// <summary>
/// The state of the interpreter.
/// The implementation of a task which takes some arguments,
/// call them "x" and "y", and which returns an argument "z",
/// should follow the following protocol:
/// a) Call "PopResult" to get x and y
/// b) Either
///   i) push "z" directly onto IThreadState using PushResult OR
///   ii) push a "task" on to the stack which will result in "z" being
///       pushed on to the result stack.
/// 
/// Note that ii) is "recursive" in its definition - that is, a task
/// that is pushed on to the task stack may in turn push other tasks
/// on the task stack which, when evaluated, 
/// ... ultimately will end up pushing the result via PushResult.
/// </summary>
public interface IThreadState
{
    void PushTask(ITask task);
    object PopResult();
    void PushResult(object result);
}

And ITask is:

public interface ITask
{
    void Execute(IThreadState s);
}

And my main "event" loop is:

ThreadState threadState = new ThreadState();
threadState.PushTask(null);
threadState.PushTask(new EvaluateForm(f, environment));
ITask next = null;

while ((next = threadState.PopTask()) != null)
    next.Execute(threadState);

return threadState.PopResult(); // Get what EvaluateForm evaluated to

EvaluateForm is just a task that calls IForm.Evaluate with a specific environment.

Personally, I found this new version much "nicer" to work with from an implementation point of view - easy to get a stack trace, easy to make it implement full continuations (although... I haven't done this as yet - need to make my "stacks" persistent linked-lists rather than using C# Stack, and ITask "returns" the new ThreadState rather than mutating it so that I can have a "call-continuation" task)... etc. etc.

Basically, you're just less dependent on the underlying language implementation.

About the only downside I can find is performance... But in my case, it's just an interpreter so I don't care that much about performance anyway.

I'd also point you to this very nice article on the benefits of re-writing recursive code as iterative code with a stack, by one of the authors of the KAI C++ compiler: Considering Recursion

like image 69
Paul Hollingsworth Avatar answered Nov 15 '22 16:11

Paul Hollingsworth


After e-mail conversation with Steve Dekorte (author of Io programming language) and Konstantin Olenin, I've found a problem and a (partial) solution to it. Imagine the call from VM to C function, which calls back VM method. During the period of time when VM executes the callback, portion of VM state lays outside of the VM: in the C stack and registers. If you would save VM state at that moment it is guaranteed that you couldn't restore the state correctly next time VM is loaded.

The solution is to model VM as a message-receiving actor: VM can send async notifications to the native code and native code can send async notifications to the VM. That is, in the single-threaded environment, when VM gains control, no additional state is stored outside of it (except data irrelevant to VM runtime).

This does not mean that you can correctly restore VM state in any circumstances, but at least, you can build your own reliable system on top of it.

like image 37
Oleg Andreev Avatar answered Nov 15 '22 16:11

Oleg Andreev