Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop-in, portable parsing

I see umpteen posts a day about "how to do X with regexen". And the best response to most of them seems like it would honestly be, "Why are you trying to drive a screw with a hammer?" But regexen are everywhere, and the syntax is mostly portable, particularly if you keep away from the fancy bits.

Is there anything equivalent to regexen but at the next level up in power and configurability? A "you can use it anywhere" parsing library of some variety, preferably with a gloriously concise DSL as its interface?

I've used Ragel somewhat, but because of the preprocessing step, I'd hesitate to recommend it to someone as "use this instead of some hairy regex". It's awkward to use from Obj-C, and I expect it will be terribly awkward from a language that doesn't have compile-link-run as part of its standard operating procedure.

What I'm looking for is something that will pass the "inline-online-universal" test.

  1. (inline) You can write the notation inline with your other code, as you would with a regex..

  2. (online) You can run the resulting parser just as you would your other code, which would mean right after input to a REPL in the case of something like Python.

  3. (universal) You can move to a different language/platform and use virtually the same code for your parser, modulo dialect differences. In reality, I'd be happy with something that works from Python, Ruby, C, Java, and Haskell.

Most tools I know of fall down at "online". They preprocess a grammar offline and spit out code in the target language (C, Python, Java, C++…). They're standalone tools that aren't themselves integrated into the language environment.

I've had suggestions of PEG parsers and lex/yacc combos. Parser combinator libraries might also be a good fit. Whatever you might propose, I'd like to see demonstrated that it meets these tests. Your answer should demonstrate that the proposed solution meets the inline-online-universal requirements by providing a working demo parser in Python, C, and Haskell. The demo example is up to the author, but it should be something painful using just regexen but trivial using a proper parser.

like image 567
Jeremy W. Sherman Avatar asked Sep 18 '12 19:09

Jeremy W. Sherman


1 Answers

https://github.com/leblancmeneses/NPEG

Implements PEG.

Meets all 3 ... let me explain.

It is inline only with C# and offline with all the others. C# has an offline version also.

I currently support offline versions: C/C++/Javascript (local right now)/Java pass all unit tests - to make it universal. To add another language takes 25.84 hrs (how long it took to create the offline Javascript version)

To make it online for every language would be to much maintenance(possible) but it took me a lot of work and time just to support the current offline versions. I can now focus my energy on building grammar optimizers and tooling to unit test grammar rules where all offline versions benefit.

like image 105
Leblanc Meneses Avatar answered Oct 18 '22 06:10

Leblanc Meneses