Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression parser generator [closed]

Sometimes, it would be convenient to have a highly optimized function for regex search instead of including a library generating parsers at runtime. Is there a parser generator that would fit such a role?

Ideally, it would:

  • create a single C function
  • generate a DFA corresponding to the given regular expression
  • be as efficient as KMP or Boyer-Moore in simple cases
like image 689
Don Reba Avatar asked Jan 27 '12 18:01

Don Reba


2 Answers

Here is list of tools that all suit your needs:

  1. Lex/Flex is perhaps the best-known tool for constructing parsers from regular expressions. Lex is useful in many scenarios but it can impose too much overhead for simple parsing applications because of heavyweight processing loop that imposes a stream "pull" model and input buffering. It was designed to parse entire files instead of simple strings.

  2. Re2C. It is a pre-processor that generates C-based recognizers from regular expressions. Generated state machines run very fast and integrate easily into any program, free of dependencies.

  3. Ragel State Machine Compiler. Another pre-processor that generates FSM code from high level regular language notation (regular expression is one case of this definition). It works for a range of languages (C, C++, Objective-C, D, Java and Ruby), can execute user actions on different FSM events, etc. What is more, it can generate state machine definition in format of Graphviz for visualization of states and transitions.

like image 146
Artem Zankovich Avatar answered Oct 06 '22 00:10

Artem Zankovich


Lex and Flex are effectively regexp-to-C compilers.

like image 29
Fred Foo Avatar answered Oct 06 '22 00:10

Fred Foo