Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCaml + Menhir Compiling/Writing

I'm a complete newbie when it comes to OCaml. I've only recently started using the language (about 2 weeks ago), but unfortunately, I've been tasked with making a syntax analyzer (parser + lexer, whose function is to either accept or not a sentence) for a made up language using Menhir. Now, I've found some materials on the internet regarding OCaml and Menhir:

The Menhir Manual.

This webpage for some French University course.

A short Menhir tutorial on Toss's homepage at Sourceforge.

A Menhir example on github by derdon.

A book on OCaml (with a few things about ocamllex+ocamlyacc

A random ocamllex tutorial by SooHyoung Oh.

And the examples that come with Menhir's source code.

(I can't put more than two hyperlinks, so I can't link you directly to some of the websites I'm mentioning here. Sorry!)

So, as you can see, I've been desperately searching for more and more material to aid me in the making of this program. Unfortunately, I still cannot grasp many concepts, and as such, I'm having many, many difficulties.

For starters, I have no idea how to correctly compile my program. I've been using the following command:

ocamlbuild -use-menhir -menhir "menhir --external-tokens Tokens" main.native

My program is divided in four different files: main.ml; lexer.mll; parser.mly; tokens.mly. main.ml is the part that gets input from a file in the file system given as an argument.

let filename = Sys.argv.(1)

let () =
    let inBuffer = open_in filename in
    let lineBuffer = Lexing.from_channel inBuffer in
    try
        let acceptance = Parser.main Lexer.main lineBuffer in
        match acceptance with
            | true -> print_string "Accepted!\n"
            | false -> print_string "Not accepted!\n"
    with
        | Lexer.Error msg -> Printf.fprintf stderr "%s%!\n" msg
        | Parser.Error -> Printf.fprintf stderr "At offset %d: syntax error.\n%!" (Lexing.lexeme_start lineBuffer)

The second file is lexer.mll.

{
  open Tokens
  exception Error of string
}

rule main = parse
  | [' ' '\t']+
      { main lexbuf }
  | ['0'-'9']+ as integer
      { INT (int_of_string integer) }
  | "True"
      { BOOL true }
  | "False"
      { BOOL false }
  | '+'
      { PLUS }
  | '-'
      { MINUS }
  | '*'
      { TIMES }
  | '/'
      { DIVIDE }
  | "def"
      { DEF }
  | "int"
      { INTTYPE }
  | ['A'-'Z' 'a'-'z' '_']['0'-'9' 'A'-'Z' 'a'-'z' '_']* as s
      { ID (s) }
  | '('
      { LPAREN }
  | ')'
      { RPAREN }
  | '>'
      { LARGER }
  | '<'
      { SMALLER }
  | ">="
      { EQLARGER }
  | "<="
      { EQSMALLER }
  | "="
      { EQUAL }
  | "!="
      { NOTEQUAL }
  | '~'
      { NOT }
  | "&&"
      { AND }
  | "||"
      { OR }
  | '('
      { LPAREN }
  | ')'
      { RPAREN }
  | "writeint"
      { WRITEINT }
  | '\n'
      { EOL }
  | eof
      { EOF }
  | _
      { raise (Error (Printf.sprintf "At offset %d: unexpected character.\n" (Lexing.lexeme_start lexbuf))) }

The third file is parser.mly.

%start <bool> main
%%

main:
| WRITEINT INT { true }

The fourth one is tokens.mly

%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR

%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB

%{
type token =
  | ID of (string)
  | INT
  | BOOL
  | DEF
  | INTTYPE
  | LPAREN
  | RPAREN
  | WRITEINT
  | PLUS
  | MINUS
  | TIMES
  | DIVIDE
  | LARGER
  | SMALLER
  | EQLARGER
  | EQSMALLER
  | EQUAL
  | NOTEQUAL
  | NOT
  | AND
  | OR
  | EOF
  | EOL
%}

%%

Now, I know there is a lot of unused symbols here, but I intend to use them in my parser. No matter how many changes I make to the files, the compiler keeps blowing up on my face. I have tried everything I can think of, and nothing seems to work. What is it that is making ocamlbuild explode in a plethora of errors of unbound constructors and non-defined start symbols? What command should I be using to compile the program properly? Where can I find meaningful materials to learn about Menhir?

like image 647
Lopson Avatar asked Mar 27 '12 20:03

Lopson


1 Answers

A simpler way to do this is to remove the Parser/Tokens separation. As Thomas noted, there is no need for a declaration type token = ..., because it is automatically produced by menhir from the %token directives.

So you can define parser.mly as:

%start <bool> main

%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR

%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%%

main:
| WRITEINT INT { true }

and lexer.mll as:

{
  open Parser
  exception Error of string
}

[...] (* rest of the code not shown here *)

then remove tokens.mly, and compile with

ocamlbuild -use-menhir main.native

and it all works well.

like image 196
gasche Avatar answered Oct 08 '22 17:10

gasche