I'm thinking about implementing a regular expression parser in a C library I'm developing. Now, the question is: is there any open source code that I could use verbatim or with as few changes as possible? My expectations regarding the code are:
Are there any ready-made solutions that you could recommend? I was looking at PCRE for C and it looks like it has everything that's available in PHP (which rules), but the size (1.4MB DL) is a bit intimidating. Do you think it's a solid bet? Or are there other options worth considering?
[EDIT]
The library I'm developing is open source, BSD licence.
regex is expensive – regex is often the most CPU-intensive part of a program. And a non-matching regex can be even more expensive to check than a matching one.
A regular expression is a sequence of characters used to match a pattern to a string. The expression can be used for searching text and validating input. Remember, a regular expression is not the property of a particular language. POSIX is a well-known library used for regular expressions in C.
Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.
PCRE is so big because regular expressions are hard. And most of it is documentation and support code anyways; it's much smaller when compiled into object code.
RE2, the Google regexp implementation does a match in linear time (O(n)
if n
is the length of the string), PCRE and most other regexp engines run in exponential time at worst case. Another noteworthy O(n)
regexp matcher is flex, but it needs all possible regexps at compile time. If you are looking for something smaller than PCRE, look at the regexp matcher in busybox, or the pattern matcher in lua.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With