Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions in C

Tags:

c

regex

I'm curious, does anybody know a good way to do regular expression matching in C? The only way I can think of is through Flex. Is this the only way or is there a better way?

Thanks!

like image 449
Scott Avatar asked Apr 07 '09 13:04

Scott


3 Answers

Uh ... The best way is to

#include <regex.h>

That is the POSIX standard API for regular expressions.

For non-POSIX systems, rolling your own is one alternative, a basic regex engine is not too hard to accomplish. I'm sure there are off-the-shelf solutions too, I haven't neded one.

Come to think of it, I think glib has one.

like image 61
unwind Avatar answered Nov 11 '22 00:11

unwind


Depending on what dialect you're looking for and what platform you're on:

  • POSIX regular expressions are likely to be in the standard C library - see <regex.h> and regcomp, regerror, regexec, regfree.
  • libPCRE provides most of perl's extended regex features
  • glib has GRegex
  • Henry Spencer's Tcl Regex library.

Henry Spencer has also done another regex library, used by the current versions of TCL and PostgreSQL. This one is interesting because it is a hybrid NFA/DFA implementation.

Regexes that accept the same string multiple ways (like a*a?) inherently require an NFA. The usual implementation models the NFA as a DFA using backtracking, which can be as much as O(2^length(input)) for particularly degenerate pattern. But, the simple recursive implementation can be extended with capturing, backreferences, callbacks to code, and all the usual "extra" features that many languages offer besides testing for matches.

The "NFA" approach tracks multiple current states and updates all of them with each incoming character (See Ross Cox's writeup of Thompson NFA regexes for more explanation). This approach is O(input.length*pattern.length), which is faster - immensely so in the worst cases - but cannot perform backrefs or captures since it doesn't keep track of how it got to a state.

Spencer's approach is a hybrid, compiling some portions of a pattern to the NFA approach, and using backtracking only where necessary for captures that were actually requested. This is often a substantial win (see TCL's position in the regex-dna shootout).

like image 11
puetzk Avatar answered Nov 10 '22 22:11

puetzk


Look this: http://www.pcre.org/

It is a VERY nice lib!

like image 3
André A. G. Scotá Avatar answered Nov 10 '22 22:11

André A. G. Scotá