Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When a string is being matched against a regular expression, what's going on behind the scenes?

Tags:

regex

I'd be interested to know what kind of algorithms are used for matching it, and how they are optimised, because I imagine that somes regexes could produce a vast number of possible matches that could cause serious problems on a poorly witten regex parser.

Also, I recently discovered the concept of a ReDoS, why do regexes such as (a|aa)+ or (a|a?)+ cause problems?

EDIT: I have used them most in C# and Python, so that's what was in my mind when I was considering the question. I assume Python's is written in C like the rest of the interpreter, but I have no idea about C#

like image 285
lavelle Avatar asked Jun 07 '11 18:06

lavelle


2 Answers

I find http://www.regular-expressions.info has really useful info about regular expressions.

The author specifically talks about catastrophic uses of regular expression.

like image 168
marto Avatar answered Oct 20 '22 10:10

marto


Regex Buddy has this debug page which "offers you a unique view inside a regular expression engine".

http://www.regexbuddy.com/debug.html

like image 36
manojlds Avatar answered Oct 20 '22 10:10

manojlds