I want to build a parser but have some problems understanding how to do this.
Sample string I would like to parse
{key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}
Preferably I would like to get an output similar to a nested map
map[key1] = value1
map[key2] = (map[key3] = value3)
map[key4] = (map[key5] = (map[key6] = value6))
How could this be done? Am I aiming in the wrong direction?
Writing parsers can be challenging. The problem is simple to describe: convert an input string into a syntax tree. The same language can be parsed by many different algorithms, and all have different tradeoffs with regards to speed, memory usage, readability, and maintainability.
The basic workflow of a parser generator tool is quite simple: you write a grammar that defines the language, or document, and you run the tool to generate a parser usable from your Python code.
Traditionally, parsing is done by taking a sentence and breaking it down into different parts of speech. The words are placed into distinct grammatical categories, and then the grammatical relationships between the words are identified, allowing the reader to interpret the sentence.
Writing a parser is a complicated topic that is too big to cover in a single answer.
Rob Pike gave an excellent talk that walks through writing a lexer (which is a half of the parser) in Go: http://www.youtube.com/watch?v=HxaD_trXwRE
You should also look at e.g. parser code in Go standard library for an example on how to do it: http://golang.org/src/pkg/go/parser/parser.go
There's also plenty resources on parsing on the internet. They might have examples in other languages but it's just a matter of translating the syntax to Go.
I recommend reading up on recursive descent parsing (e.g. http://www.cs.binghamton.edu/~zdu/parsdemo/recintro.html) or top down parsing (e.g. http://javascript.crockford.com/tdop/tdop.html, http://effbot.org/zone/simple-top-down-parsing.htm).
What about using the standard goyacc tool? Here is a skeleton:
%{
package main
import (
"fmt"
"log"
)
%}
%union{
tok int
val interface{}
pair struct{key, val interface{}}
pairs map[interface{}]interface{}
}
%token KEY
%token VAL
%type <val> KEY VAL
%type <pair> pair
%type <pairs> pairs
%%
goal:
'{' pairs '}'
{
yylex.(*lex).m = $2
}
pairs:
pair
{
$$ = map[interface{}]interface{}{$1.key: $1.val}
}
| pairs '|' pair
{
$$[$3.key] = $3.val
}
pair:
KEY '=' VAL
{
$$.key, $$.val = $1, $3
}
| KEY '=' '{' pairs '}'
{
$$.key, $$.val = $1, $4
}
%%
type token struct {
tok int
val interface{}
}
type lex struct {
tokens []token
m map[interface{}]interface{}
}
func (l *lex) Lex(lval *yySymType) int {
if len(l.tokens) == 0 {
return 0
}
v := l.tokens[0]
l.tokens = l.tokens[1:]
lval.val = v.val
return v.tok
}
func (l *lex) Error(e string) {
log.Fatal(e)
}
func main() {
l := &lex{
// {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}
[]token{
{'{', ""},
{KEY, "key1"},
{'=', ""},
{VAL, "value1"},
{'|', ""},
{KEY, "key2"},
{'=', ""},
{'{', ""},
{KEY, "key3"},
{'=', ""},
{VAL, "value3"},
{'}', ""},
{'|', ""},
{KEY, "key4"},
{'=', ""},
{'{', ""},
{KEY, "key5"},
{'=', ""},
{'{', ""},
{KEY, "key6"},
{'=', ""},
{VAL, "value6"},
{'}', ""},
{'}', ""},
{'}', ""},
},
map[interface{}]interface{}{},
}
yyParse(l)
fmt.Println(l.m)
}
$ go tool yacc -o main.go main.y && go run main.go
map[key4:map[key5:map[key6:value6]] key1:value1 key2:map[key3:value3]]
$
Be advised that, with Go 1.8 (currently in beta in Q4 2016, released in Q1 2017)
The
yacc
tool (previously available by running “go tool yacc
”) has been removed.
As of Go 1.7 it was no longer used by the Go compiler.It has moved to the “
tools
” repository and is now available atgolang.org/x/tools/cmd/goyacc
.
That particular format is very similar to json. You could use the following code to leverage that similarity:
var txt = `{key1 = "\"value1\"\n" | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}`
var s scanner.Scanner
s.Init(strings.NewReader(txt))
var b []byte
loop:
for {
switch tok := s.Scan(); tok {
case scanner.EOF:
break loop
case '|':
b = append(b, ',')
case '=':
b = append(b, ':')
case scanner.Ident:
b = append(b, strconv.Quote(s.TokenText())...)
default:
b = append(b, s.TokenText()...)
}
}
var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
// handle error
}
fmt.Printf("%#v\n",m)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With