Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better solution than lex/yacc for parsing a DSL in C?

Tags:

c

parsing

dsl

One of my programs accepts a commands (like kill foo) at runtime. Think of it as a little domain-specific language. Here are a few examples:

kill
kill client
exit

But also, chained commands are allowed and whitespace is not significant before and after commands, so the following examples are also valid:

kill ; say "that was fun"
  kill  ;  kill      ; kill;

I have currently implemented this with lex/yacc (flex/bison to be specific) and that caused a lot of headache. The lexer very much depends on the context (for example whitespace tokens are generally not returned, unless after a kill keyword for example) and has many different states. The grammar used to have conflicts and I don’t really like the format in which it has to be specified (especially the $1, $2, $3, … to use arguments for non-terminals). Also, the error messages which bison provides (at parse-time) are sometimes accurate, but often not (the kill command with optional arguments leads to error messages like Unexpected $undefined, expected $end or ; for kill clont instead of kill client). Lastly, the C API for yacc is cruel (external defines all over the place).

I am not asking you to solve all the aforementioned questions (I will open separate threads with more specific descriptions and code if there is no way around lex/yacc). Instead, I am interested in alternatives to lex/yacc.

My criteria are the following:

  • Input is a string (const char *), there is no output but instead some code should be called for each different keyword.
  • I want to use this with C (C99).
  • The software should be already included in the major linux distros or at least easy to bundle / package.
  • It should be well-documented.
  • The syntax for describing my language should be easy.
  • It should output meaningful error messages upon parsing errors.
  • Performance is not that important (of course it should be fast, but the typical use case is interactive usage, not processing tons of MB of commands).
like image 803
Michael Avatar asked Jan 19 '23 18:01

Michael


1 Answers

As for a very simple and small grammar, I'd consider writing the lexer/parser by hand - it's often not that much work.

Virtually all linux distros ship a variant of lex/yacc. Other than that, two other widely used parser generators are lemon and antlr.

like image 104
nos Avatar answered Jan 23 '23 16:01

nos