Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with variable references in yacc/bison (with ocaml)

I was wondering how to deal with variable references inside statements while writing grammars with ocamlyacc and ocamllex.

The problem is that statements of the form

var x = y + z
var b = true | f;

should be both correct but in the first case variable refers to numbers while in the second case f is a boolean variable.

In the grammar I'm writing I have got this:

numeric_exp_val:
  | nint { Syntax.Int $1 }
  | FLOAT { Syntax.Float $1 }
  | LPAREN; ne = numeric_exp; RPAREN { ne }
  | INCR; r = numeric_var_ref { Syntax.VarIncr (r,1) }
  | DECR; r = numeric_var_ref { Syntax.VarIncr (r,-1) }
  | var_ref { $1 }
;

boolean_exp_val:
  | BOOL { Syntax.Bool $1 }
  | LPAREN; be = boolean_exp; RPAREN { be }
  | var_ref { $1 }
;

which obviously can't work, since both var_ref non terminals reduce to the same (reduce/reduce conflict). But I would like to have type checking that is mostly statically done (with respect to variable references) during the parsing phase itself.

That's why I'm wondering which is the best way to have variable references and keep this structure. Just as an additional info I have functions that compile the syntax tree by translating it into a byte code similar to this one:

let rec compile_numeric_exp exp =
  match exp with
    Int i -> [Push (Types.I.Int i)]
    | Float f -> [Push (Types.I.Float f)]
    | Bop (BNSum,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Plus]
    | Bop (BNSub,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Minus]
    | Bop (BNMul,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Times]
    | Bop (BNDiv,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Div]
    | Bop (BNOr,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Or]
    | VarRef n -> [Types.I.MemoryGet (Memory.index_for_name n)]
    | VarIncr ((VarRef n) as vr,i) -> (compile_numeric_exp vr) @ [Push (Types.I.Int i);Types.I.Plus;Types.I.Dupe] @ (compile_assignment_to n) 
    | _ -> []
like image 945
Jack Avatar asked Dec 12 '22 07:12

Jack


2 Answers

Parsing is simply not the right place to do type-checking. I don't understand why you insist on doing this in this pass. You would have much clearer code and greater expressive power by doing it in a separate pass.

Is it for efficiency reasons? I'm confident you could devise efficient incremental-typing routines elsewhere, to be called from the grammar production (but I'm not sure you'll win that much). This looks like premature optimization.

There has been work on writing type systems as attribute grammars (which could be seen as a declarative way to express typing derivations), but I don't think it is meant to conflate parsing and typing in a single pass.

If you really want to go further in this direction, I would advise you to use a simple lexical differentiation between num-typed and bool-typed variables. This sounds ugly but is simple.

like image 151
gasche Avatar answered Dec 28 '22 10:12

gasche


If you want to treat numeric expressions and boolean expressions as different syntactic categories, then consider how you must parse var x = ( ( y + z ) ). You don't know which type of expression you're parsing until you hit the +. Therefore, you need to eat up several tokens before you know whether you are seeing a numeric_exp_val or a boolean_exp_val: you need some unbounded lookahead. Yacc does not provide such lookahead (Yacc only provides a restricted form of lookahead, roughly described as LALR, which puts bounds on parsing time and memory requirements). There is even an ambiguous case that makes your grammar context-sensitive: with a definition like var x = y, you need to look up the type of y.

You can solve this last ambiguity by feeding back the type information into the lexer, and you can solve the need for lookahead by using a parser generator that supports unbounded lookahead. However, both of these techniques will push your parser towards a point where it can't easily evolve if you want to expand the language later on (for example to distinguish between integer and floating-point numbers, to add strings or lists, etc.).

If you want a simple but constraining fix with a low technological overhead, I'll second gasche's suggestion of adding a syntactic distinguisher for numeric and boolean variable definitions, something like bvar b = … and nvar x = …. There again, this will make it difficult to support other types later on.

You will have an easier time overall if you separate the type checking from the parsing. Once you've built an abstract syntax tree, do a pass of type checking (in which you will infer the type of variables.

type numeric_expression = Nconst of float | Nplus of numeric_expression * numeric_expression | …
 and boolean_expression = Bconst of bool | Bor of boolean_expression * boolean_expression | …
type typed_expression = Tnum of numeric_expression | Tbool of boolean_expression
type typed_statement = Tvar of string * typed_expression
let rec type_expression : Syntax.expression -> typed_expression = function
  | Syntax.Float x -> Tnum (Nconst x)
  | Syntax.Plus (e1, e2) ->
      begin match type_expression e1, type_expression e2 with
        | Tnum n1, Tnum n2 -> Tnum (Nplus (n1, n2))
        | _, (Tbool _ as t2) -> raise (Invalid_argument_type ("+", t2))
        | (Tbool _ as t1), _ -> raise (Invalid_argument_type ("+", t1))
      end
  | …
like image 26
Gilles 'SO- stop being evil' Avatar answered Dec 28 '22 08:12

Gilles 'SO- stop being evil'