Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a parser

Tags:

go

I want to build a parser but have some problems understanding how to do this.

Sample string I would like to parse

{key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}

Preferably I would like to get an output similar to a nested map

map[key1] = value1
map[key2] = (map[key3] = value3)
map[key4] = (map[key5] = (map[key6] = value6))

How could this be done? Am I aiming in the wrong direction?

like image 947
fritjof Avatar asked Dec 07 '11 20:12

fritjof


People also ask

How hard is it to write a parser?

Writing parsers can be challenging. The problem is simple to describe: convert an input string into a syntax tree. The same language can be parsed by many different algorithms, and all have different tradeoffs with regards to speed, memory usage, readability, and maintainability.

How do you write parser in Python?

The basic workflow of a parser generator tool is quite simple: you write a grammar that defines the language, or document, and you run the tool to generate a parser usable from your Python code.

How parsing is done?

Traditionally, parsing is done by taking a sentence and breaking it down into different parts of speech. The words are placed into distinct grammatical categories, and then the grammatical relationships between the words are identified, allowing the reader to interpret the sentence.


4 Answers

Writing a parser is a complicated topic that is too big to cover in a single answer.

Rob Pike gave an excellent talk that walks through writing a lexer (which is a half of the parser) in Go: http://www.youtube.com/watch?v=HxaD_trXwRE

You should also look at e.g. parser code in Go standard library for an example on how to do it: http://golang.org/src/pkg/go/parser/parser.go

There's also plenty resources on parsing on the internet. They might have examples in other languages but it's just a matter of translating the syntax to Go.

I recommend reading up on recursive descent parsing (e.g. http://www.cs.binghamton.edu/~zdu/parsdemo/recintro.html) or top down parsing (e.g. http://javascript.crockford.com/tdop/tdop.html, http://effbot.org/zone/simple-top-down-parsing.htm).

like image 88
Krzysztof Kowalczyk Avatar answered Oct 09 '22 15:10

Krzysztof Kowalczyk


What about using the standard goyacc tool? Here is a skeleton:

main.y

%{
package main

import (
    "fmt"
    "log"
)
%}

%union{
    tok int
    val interface{}
    pair struct{key, val interface{}}
    pairs map[interface{}]interface{}
}

%token KEY
%token VAL

%type <val> KEY VAL
%type <pair> pair
%type <pairs> pairs

%%

goal:
    '{' pairs '}'
    {
        yylex.(*lex).m = $2
    }

pairs:
    pair
    {
        $$ = map[interface{}]interface{}{$1.key: $1.val}
    }
|   pairs '|' pair
    {
        $$[$3.key] = $3.val
    }

pair:
    KEY '=' VAL
    {
        $$.key, $$.val = $1, $3
    }
|   KEY '=' '{' pairs '}'
    {
        $$.key, $$.val = $1, $4
    }


%%

type token struct {
    tok int
    val interface{}
}

type lex struct {
    tokens []token
    m map[interface{}]interface{}
}

func (l *lex) Lex(lval *yySymType) int {
    if len(l.tokens) == 0 {
        return 0
    }

    v := l.tokens[0]
    l.tokens = l.tokens[1:]
    lval.val = v.val
    return v.tok
}

func (l *lex) Error(e string) {
    log.Fatal(e)
}

func main() {
    l := &lex{
        // {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}
        []token{
            {'{', ""},
            {KEY, "key1"},
            {'=', ""},
            {VAL, "value1"},
            {'|', ""},
            {KEY, "key2"},
            {'=', ""}, 
            {'{', ""},
            {KEY, "key3"},
            {'=', ""},
            {VAL, "value3"},
            {'}', ""},
            {'|', ""},
            {KEY, "key4"},
            {'=', ""},
            {'{', ""},
            {KEY, "key5"},
            {'=', ""},
            {'{', ""},
            {KEY, "key6"},
            {'=', ""},
            {VAL, "value6"},
            {'}', ""},
            {'}', ""},
            {'}', ""},
        },
        map[interface{}]interface{}{},
    }
    yyParse(l)
    fmt.Println(l.m)
}

Output

$ go tool yacc -o main.go main.y && go run main.go
map[key4:map[key5:map[key6:value6]] key1:value1 key2:map[key3:value3]]
$ 
like image 43
zzzz Avatar answered Oct 09 '22 15:10

zzzz


Be advised that, with Go 1.8 (currently in beta in Q4 2016, released in Q1 2017)

The yacc tool (previously available by running “go tool yacc”) has been removed.
As of Go 1.7 it was no longer used by the Go compiler.

It has moved to the “tools” repository and is now available at golang.org/x/tools/cmd/goyacc.

like image 5
VonC Avatar answered Oct 09 '22 17:10

VonC


That particular format is very similar to json. You could use the following code to leverage that similarity:

    var txt = `{key1 = "\"value1\"\n" | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}`
    var s scanner.Scanner
    s.Init(strings.NewReader(txt))
    var b []byte

loop:
    for {
        switch tok := s.Scan(); tok {
        case scanner.EOF:
            break loop
        case '|':
            b = append(b, ',')
        case '=':
            b = append(b, ':')
        case scanner.Ident:
            b = append(b, strconv.Quote(s.TokenText())...)
        default:
            b = append(b, s.TokenText()...)
        }
    }

    var m map[string]interface{}
    err := json.Unmarshal(b, &m)
    if err != nil {
        // handle error
    }

    fmt.Printf("%#v\n",m)
like image 2
SteveMcQwark Avatar answered Oct 09 '22 15:10

SteveMcQwark