Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplest of parsers in go tool yacc

Tags:

go

yacc

Using this command:

go tool yacc -p Verb -o verb.go boilerplate.y

Trying to build this yacc file:

// boilerplate.y
%{

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

%}

%% 

.|\n   ECHO;

%%

func main() {
    fi := bufio.NewReader(os.NewFile(0, "stdin"))
  s, err := fi.ReadString('\n')
  if err != nil {
    fmt.Println('error', err)
  } 

  VerbParse(&VerbLex{s: s})
}

Error: bad syntax on first rule: boilerplate.y:16

Successfully got this example to work:

https://github.com/golang-samples/yacc/blob/master/simple/calc.y

Trying to build my own and work through the lex & yacc book. Resources seem limited to non existent.

like image 971
Justin Thomas Avatar asked Jul 28 '16 19:07

Justin Thomas


People also ask

What is use of $$ in yacc?

those $$ , $1 , $3 are the semantic values for for the symbols and tokens used in the rule in the order that they appear. The semantic value is that one that you get in yylval when the scanner gets a new token. $1 has the semantic value of the first num. $2 is not used as it is the token '+'.

What is token in yacc?

Token numbers are nonnegative integers that represent the names of tokens. If the lexical analyzer passes the token number to the parser, instead of the actual token name, both programs must agree on the numbers assigned to the tokens. You can assign numbers to the tokens used in the yacc grammar file.


1 Answers

You have an incorrect rule in your specifications.

A specification file has the following declaration:

declarations
%%
rules
%%
programs

Where a rule is defined as:

A  :  BODY  ;

Where A is a non-terminal symbol, while BODY is made up of tokens (terminal symbols), non-terminals and literals. The : and ; are required components of rule declaration syntax.

Hence the rule:

.|\n   ECHO;

is syntactically incorrect.

Since you are simply trying to echo the input, a very simple implementation based on calc.y would be following (file echo.y):

rules

%%

in : /* empty */
  | in input '\n'
     { fmt.Printf("Read character: %s\n", $2) }
  ;

input : CHARACTER
  | input CHARACTER
      { $$ = $1 + $2 }
  ;

program

%%

type InputLex struct {
    // contains one complete input string (with the trailing \n)
    s string
    // used to keep track of parser position along the above imput string
    pos int
}

func (l *InputLex) Lex(lval *InputSymType) int {
    var c rune = ' '

    // skip through all the spaces, both at the ends and in between
    for c == ' ' {
        if l.pos == len(l.s) {
            return 0
        }
        c = rune(l.s[l.pos])
        l.pos += 1
    }

    // only look for input characters that are either digits or lower case
    // to do more specific parsing, you'll define more tokens and have a 
    // more complex parsing logic here, choosing which token to return
    // based on parsed input
    if unicode.IsDigit(c) || unicode.IsLower(c) {
        lval.val = string(c)
        return CHARACTER
    }

    // do not return any token in case of unrecognized grammer
    // this results in syntax error
    return int(c)
}

func (l *InputLex) Error(s string) {
    fmt.Printf("syntax error: %s\n", s)
}

func main() {
    // same as in calc.y
}

func readline(fi *bufio.Reader) (string, bool) {
    // same as in calc.y
}

To compile and run this program, do the following at command prompt:

go tool yacc -o echo.go -p Input echo.y
go run echo.go

As you can see, you'll have to define your own parsing rules in the Lex method. The struct InputLex is designed to hold the values while your input is being parsed. InputSymType is auto generated and is defined by the %union declared in the declaration part of specification.

As far as I can tell, there is no way to directly use JISON or a regex to do the matching using go's yacc tool. You may have to take a look at some other libraries.

More details can be found here: http://dinosaur.compilertools.net/yacc/

Full working code here: https://play.golang.org/p/u1QxwRKLCl

like image 97
abhink Avatar answered Oct 10 '22 18:10

abhink