Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplest nested block parser

Tags:

lexer

I want to write a simple parser for a nested block syntax, just hierarchical plain-text. For example:

Some regular text.
This is outputted as-is, foo{but THIS
is inside a foo block}.

bar{
  Blocks can be multi-line
  and baz{nested}
}

What's the simplest way to do this? I've already written 2 working implementations, but they are overly complex. I tried full-text regex matching, and streaming char-by-char analysis.

I have to teach the workings of it to people, so simplicity is paramount. I don't want to introduce a dependency on Lex/Yacc Flex/Bison (or PEGjs/Jison, actually, this is javascript).

like image 450
slezica Avatar asked Jun 15 '26 20:06

slezica


1 Answers

The good choices probably boil down as follows:

  • Given your constaints, it's going to be recursive-descent. That's a fine way to go even without constraints.
  • you can either parse char-by-char (traditional) or write a lexical layer that uses the local string library to scan for { and }. Either way, you might want to return three terminal symbols plus EOF: BLOCK_OF_TEXT, LEFT_BRACE, and RIGHT_BRACE.
like image 152
DigitalRoss Avatar answered Jun 20 '26 00:06

DigitalRoss