Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to safely translate tree data structures between C++ / Ocaml?

I have a legacy data structure that is written in C++ and a new tool in OCaml that is expected to deal with that legacy data. So I need to import/translate data from the former to the latter. The data is in the form of a tree and usually handled by visitors.

As a simple example consider this minimal DSL:

#include <memory>

using namespace std;

class intnode;
class addnode;

struct visitor {
    virtual void visit(const intnode& n) = 0;
    virtual void visit(const addnode& n) = 0;
};

struct node {
    virtual void accept(visitor& v) = 0;
};

struct intnode : public node {
    int x;

    virtual void accept(visitor& v) { v.visit(*this); }
};

struct addnode : public node {
    shared_ptr<node> l;
    shared_ptr<node> r;

    virtual void accept(visitor& v) { v.visit(*this); }
};

Its representation in OCaml looks like this:

type node = Int of int
          | Plus of node * node

let make_int x = Int x
let make_plus l r = Plus(l,r)

The question is, how do I safely and efficiently transform the C++ tree into its OCaml representation?

So far, I have two approaches:

Approach 1

Write a visitor that invokes the OCaml constructors and yields a value, e.g. something like this:

value translate(shared_ptr<node> n);

struct translator : public visitor {
    value retval;

    virtual visit(const intnode& n) {
        retval = call(make_int, Val_int(x->value));
    }

    virtual visit(const addnode& n) {
        value l = translate(n.l);
        value r = translate(n.r);
        retval =  call(make_add, l, r);
    }
};

value translate(shared_ptr<node> n)
{
    translator t;
    t.visit(*n);
}

Simply assume that call does all the required scaffolding to call back into OCaml and invoke the correct constructor.

The problem with that approach is OCaml's garbag collector. If the GC runs, while the C++ side has some value on is stack, that value (which is after all a pointer into OCaml's heap) might be invalidated. So I need some way to inform OCaml about the fact that the values are still needed. Usually this is done with the CAML* macros, but how do I do this in a case like this? Can I use these macros inside the visit methods?

Approach 2

The second approach is more complicated. When there is no way to safely store intermediate references, I could turn things around and push the C++ pointers into the OCaml heap:

type cppnode (* C++ pointer *)

type functions = {
  transl_plus : cppnode -> cppnode -> node;
  transl_int : int -> node;
}

external dispatch : functions -> cppnode -> node = "dispatch_transl"

let rec translate n = dispatch {transl_plus; transl_int = make_int} n

and transl_plus a b = make_plus (translate a) (translate b)

The idea here is that the function "dispatch" will wrap all sub nodes into CustomVal structures and pass them to OCaml without storing any intermediate values. The corresponding visitor would only implement pattern matching. This should clearly work with the GC, but has the downside of being slightly less efficient (because of the pointer wrapping) and potentially less readable (because of the distinction between dispatch and reconstruction).

Is there any way to obtain the safety of approach 2 with the elegance of approach 1?

like image 323
choeger Avatar asked Oct 04 '17 10:10

choeger


1 Answers

I don't see any problems in constructing OCaml values on C stack even in the recursive case. In your example, you're using a structure member to store an OCaml heap value. That's also possible, however, you need to use caml_register_global_root or caml_register_generational_root and to release them with caml_remove_global_root or caml_remove_generational_global_root. In fact, you may even build a smart pointer that will hold OCaml values.

With all these said, I still don't see any reasons (at least for the simplified example, that you demonstrated) why you should go into class members for that, this is how I would solve it:

struct translator : public visitor {

    virtual value visit(const intnode& n) {
        CAMLparam0();
        CAMLlocal1(x);
        x = call(make_int, Val_int(n->value);
        CAMLreturn(x);
    }

    virtual value visit(const addnode& n) {
        CAMLparam0();
        CAMLlocal(l,r,x);
        l = visit(*n.l);
        r = visit(*n.r);
        x = call(make_add, l, r);
        CAMLreturn(x);
    }
};

This, of course, assumes, that you have a visitor that can return values of arbitrary types. If you don't have one, and don't want to implement one, then you can definitely build your value incrementally:

value translate(shared_ptr<node> n);

class builder : public visitor {
    value result;
 public:
    builder() {
       result = Val_unit; // or any better default
       caml_register_generational_global_root(&result);
    }

    virtual ~builder() {
       caml_remove_generational_global_root(&result);
    }

    virtual void visit(const intnode& n) {
        CAMLparam0();
        CAMLlocal1(x);
        x = call(make_int, Val_int(n->value);
        caml_modify_generational_global_root(&result, x);
        CAMLreturn0;
    }

    virtual void visit(const addnode& n) {
        CAMLparam0();
        CAMLlocal(l,r,x);
        l = translate(n.l);
        r = translate(n.r);
        x = call(make_add, l, r);
        caml_modify_generational_global_root(&result,x)
        CAMLreturn0;
    }
};

value translate(share_ptr<node> node) {
  CAMLparam0();
  CAMLlocal1(x);
  builder b;
  b.visit(*node);
  x = b.result;
  CAMLreturn(x);
}

You may also look at Berke Durak's Aurochs project, that builds parse trees in place using C.

like image 79
ivg Avatar answered Nov 10 '22 10:11

ivg