I'm curious, does anybody know a good way to do regular expression matching in C? The only way I can think of is through Flex. Is this the only way or is there a better way?
Thanks!
Uh ... The best way is to
#include <regex.h>
That is the POSIX standard API for regular expressions.
For non-POSIX systems, rolling your own is one alternative, a basic regex engine is not too hard to accomplish. I'm sure there are off-the-shelf solutions too, I haven't neded one.
Come to think of it, I think glib has one.
Depending on what dialect you're looking for and what platform you're on:
Henry Spencer has also done another regex library, used by the current versions of TCL and PostgreSQL. This one is interesting because it is a hybrid NFA/DFA implementation.
Regexes that accept the same string multiple ways (like a*a?) inherently require an NFA. The usual implementation models the NFA as a DFA using backtracking, which can be as much as O(2^length(input)) for particularly degenerate pattern. But, the simple recursive implementation can be extended with capturing, backreferences, callbacks to code, and all the usual "extra" features that many languages offer besides testing for matches.
The "NFA" approach tracks multiple current states and updates all of them with each incoming character (See Ross Cox's writeup of Thompson NFA regexes for more explanation). This approach is O(input.length*pattern.length), which is faster - immensely so in the worst cases - but cannot perform backrefs or captures since it doesn't keep track of how it got to a state.
Spencer's approach is a hybrid, compiling some portions of a pattern to the NFA approach, and using backtracking only where necessary for captures that were actually requested. This is often a substantial win (see TCL's position in the regex-dna shootout).
Look this: http://www.pcre.org/
It is a VERY nice lib!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With