Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling Erlang To Javascript Via Core Erlang

So started making progress on LuvvieScript and then it all kicked off a bit on Twitter... https://twitter.com/gordonguthrie/status/389659700741943296

Anthony Ramine https://twitter.com/nokusu made the point that I was doing it wrong and I should be compiling from Erlang to JavaScript via Core Erlang and not the Erlang AST. This is both an compelling yet unattractive option for me... Twitter not being the right medium for that discussion I thought I would write it up here and get some advice on that.

Strategic Overview

LuvvieScript has three core requirements:

  • a valid subset of Erlang that compiles to same and performant Javascript
  • a complete Source Map so that it can be debugged in the browser in LuvvieScript not Javascript
  • a 'runtime' client-side javascript environment (with server-side comms) to execute LuvvieScript modules in (a sort of in-page supervisor...)

The third of these options is kinda out of scope for this debate but the first two are core.

There is a lazy-gits corollary - I want to use as many Erlang and Javascript syntax tools (lexers, parser, tokenizers, AST transforms, etc, etc, etc) as possible and write the smallest amount of code.

Current Thinking

The way the code is currently written as the following structure:

  • compile the code to the Erlang AST (which has line numbers)
  • tokenise the code (keeping comments and white space) and use those tokens to build a dictionary that maps line/column info to tokens
  • merge the dictionary and AST to give a line/col AST (with some fannying about to group fns of different arities)
  • transform this new Erlang AST to a Javascript AST as implmented in the SpiderMonkey Parser API https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
  • use Javascript utils like brushtail to mutate away tail calls in the Javascript AST https://github.com/puffnfresh/brushtail
  • use Javascript utils like ESCodeGen to emit the javascript https://github.com/Constellation/escodegen

Basically I get an Erlang AST that looks something like this:

 [{function,
      {19,{1,9}},
      atom1_fn,0,
      [{clause,
           {19,none},
           [],
           [[]],
           [{match,
                {20,none},
                [{var,{20,{5,6}},'D'}],
                [{atom,{20,{11,15}},blue}]},
            {var,{21,{5,6}},'D'}]}]}]},

and I then transpose it into a Javascript JSON AST that looks like:

{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "answer",
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4
                            },
                            "end": {
                                "line": 2,
                                "column": 10
                            }
                        }
                    },
                    "init": {
                        "type": "BinaryExpression",
                        "operator": "*",
                        "left": {
                            "type": "Literal",
                            "value": 6,
                            "raw": "6",
                            "loc": {
                                "start": {
                                    "line": 2,
                                    "column": 13
                                },
                                "end": {
                                    "line": 2,
                                    "column": 14
                                }
                            }
                        },
                        "right": {
                            "type": "Literal",
                            "value": 7,
                            "raw": "7",
                            "loc": {
                                "start": {
                                    "line": 2,
                                    "column": 17
                                },
                                "end": {
                                    "line": 2,
                                    "column": 18
                                }
                            }
                        },
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 13
                            },
                            "end": {
                                "line": 2,
                                "column": 18
                            }
                        }
                    },
                    "loc": {
                        "start": {
                            "line": 2,
                            "column": 4
                        },
                        "end": {
                            "line": 2,
                            "column": 18
                        }
                    }
                }
            ],
            "kind": "var",
            "loc": {
                "start": {
                    "line": 2,
                    "column": 0
                },
                "end": {
                    "line": 2,
                    "column": 19
                }
            }
        }
    ],
    "loc": {
        "start": {
            "line": 2,
            "column": 0
          },
        "end": {
            "line": 2,
            "column": 19
           }
    }
}

El Problemo

Anthony's point is well made - Core Erlang is a simplified and more regular language than Erlang and should be more easily transpiled to Javascript than plain Erlang, but it is not very well documented.

I can get an AST like representation of Core Erlang easily enough:

{c_module,[],
    {c_literal,[],basic_types},
    [{c_var,[],{atom1_fn,0}},
     {c_var,[],{atom2_fn,0}},
     {c_var,[],{bish_fn,1}},
     {c_var,[],{boolean_fn,0}},
     {c_var,[],{float_fn,0}},
     {c_var,[],{int_fn,0}},
     {c_var,[],{module_info,0}},
     {c_var,[],{module_info,1}},
     {c_var,[],{string_fn,0}}],
    [],
    [{{c_var,[],{int_fn,0}},{c_fun,[],[],{c_literal,[],1}}},
     {{c_var,[],{float_fn,0}},{c_fun,[],[],{c_literal,[],2.3}}},
     {{c_var,[],{boolean_fn,0}},{c_fun,[],[],{c_literal,[],true}}},
     {{c_var,[],{atom1_fn,0}},{c_fun,[],[],{c_literal,[],blue}}},
     {{c_var,[],{atom2_fn,0}},{c_fun,[],[],{c_literal,[],'Blue 4 U'}}},
     {{c_var,[],{string_fn,0}},{c_fun,[],[],{c_literal,[],"string theory"}}},
     {{c_var,[],{bish_fn,1}},
      {c_fun,[],
          [{c_var,[],'_cor0'}],
          {c_case,[],
              {c_var,[],'_cor0'},
              [{c_clause,[],
                   [{c_literal,[],bash}],
                   {c_literal,[],true},
                   {c_literal,[],berk}},
               {c_clause,[],
                   [{c_literal,[],bosh}],
                   {c_literal,[],true},
                   {c_literal,[],bork}},
               {c_clause,
                   [compiler_generated],
                       [{c_var,[],'_cor1'}],
                   {c_literal,[],true},
                   {c_primop,[],
                       {c_literal,[],match_fail},
                       [{c_tuple,[],
                            [{c_literal,[],case_clause},
                             {c_var,[],'_cor1'}]}]}}]}}},
     {{c_var,[],{module_info,0}},
      {c_fun,[],[],
          {c_call,[],
              {c_literal,[],erlang},
              {c_literal,[],get_module_info},
              [{c_literal,[],basic_types}]}}},
     {{c_var,[],{module_info,1}},
      {c_fun,[],
          [{c_var,[],'_cor0'}],
          {c_call,[],
              {c_literal,[],erlang},
              {c_literal,[],get_module_info},
              [{c_literal,[],basic_types},{c_var,[],'_cor0'}]}}}]}

But no line col/nos. So I can get an AST that will generate JS - but critically not SourceMaps.

Question 1 How can I get the line information I need - (I can already get column information from the 'normal' Erlang tokens...)

Erlang Core is slightly different to normal Erlang in the production process because it starts substituting variable names in function calls for its own internal ones which will also cause some Source Map problems. An example would be this Erlang clause:

bish_fn(A) ->
    case A of
        bash -> berk;
        bosh -> bork
    end.

The Erlang AST preserves the names nicely:

 [{function,
      {31,{1,8}},
      bish_fn,1,
      [{clause,
           {31,none},
           [{var,{31,{11,12}},'A'}],
           [[]],
           [{'case',
                {32,none},
                [{var,{32,{11,12}},'A'}],
                [{clause,
                     {33,none},
                     [{atom,{33,{9,13}},bash}],
                     [[]],
                     [{atom,{34,{13,17}},berk}]},
                 {clause,
                     {35,none},
                     [{atom,{35,{9,13}},bosh}],
                     [[]],
                     [{atom,{36,{13,17}},bork}]}]}]}]}]},

Core Erlang has already mutated away the names of the parameters called in the function:

'bish_fn'/1 =
    %% Line 30
    fun (_cor0) ->
    %% Line 31
    case _cor0 of
      %% Line 32
      <'bash'> when 'true' ->
          'berk'
      %% Line 33
      <'bosh'> when 'true' ->
          'bork'
      ( <_cor1> when 'true' ->
        primop 'match_fail'
            ({'case_clause',_cor1})
        -| ['compiler_generated'] )
    end

Question 2 is there anything I can to to preserve or map variable names in Core Erlang?

Question 3 I appreciate that Core Erlang is explicity designed to make it easy to compile into Erlang and write tools that mutate Erlang Code - but the question really it will it make it easier to compile out of Erlang?

Options

I could fork the core erlang code and add a source mapping options but I play the Lazy Man card here...

Update

In response to Eric's response, I should clarify how I am generating the Core Erlang cerl records. I first compile my plain Erlang to core erlang using:

c(some_module, to_core)

Then I use core_scan and core_parse in this function nicked from compiler.erl:

compile(File) ->
    case file:read_file(File) of
        {ok,Bin} ->
            case core_scan:string(binary_to_list(Bin)) of
                {ok,Toks,_} ->
                    case core_parse:parse(Toks) of
                        {ok, Mod} ->
                            {ok, Mod};
                        {error,E} ->
                            {error, {parse, E}}
                    end;
                {error,E,_} ->
                    {error, {scan, E}}
            end;
        {error,E} ->
            {error,{read, E}}
    end.

The question is how do I/can I get that toolchain to emit an annotated AST. I suspect I would need to add those options myself :(

like image 579
Gordon Guthrie Avatar asked Oct 18 '13 16:10

Gordon Guthrie


1 Answers

  1. Line numbers are provided as annotations. If you look at the cerl module, which I really recommend you use, you will see everything pretty much takes a list of annotations. One of those annotations is an unadorned number that represents the line number. If I remember correctly for Core AST directly and the atom1_fn var was on line 10. The AST would look as follows:

    {c_var,[10],{atom1_fn,0}}

  2. No, you have to do all the bookkeeping yourself. There isn't anything out there to do it for you.

  3. I am not sure I understand this question.

Everything Anthony said was true about Core Erlang. Those are the very same reasons I chose Core Erlang as a target language for Joxa. The lesson I learned from that is that while Core Erlang is a great easy to target target language it has two major drawbacks that recommend against it.

  1. Dialyzer only works with an Erlang AST in the abstract code block of the beam file. There is no way to get such an AST into that abstract code block when compiling to Core Erlang. So if you target Core Erlang, Dialyzer wont work for you. That is true regardless of whether or not you produce the correct spec attributes.

  2. You lose the use of tools that work on the Erlang AST. For example, the ability to compile to Erlang Source. The Core Erlang to/from source compilers are very buggy and simply do not work. This is a major win in a lot of areas of pragmatic use.

I am actually in the process of retargeting Joxa to the Erlang AST for the above reasons.

Btw, you might be interested in this project. https://github.com/5HT/shen. Its a JavaScript compiler for the Erlang AST that already exists and is working. Though I don't have a lot of experience with it.

** Edit: You can actually see a core erlang AST generated from Erlang source. This helps a ton when learning how to compile to core. ec_compile in the erlware_commons repo has a lot of utility functions to help with that.

like image 156
ericbmerritt Avatar answered Oct 24 '22 07:10

ericbmerritt