Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the Time Complexity of Average Regex algorithms?

I'm not new to using regular expressions, and I understand the basic theory they're based on--finite state machines.

I'm not so good at algorithmic analysis though and don't understand how a regex compares to say, a basic linear search. I'm asking because on the surface it seems like a linear array search. (If the regex is simple.)

Where could I go to learn more about implementing a regex engine?

like image 509
avgvstvs Avatar asked May 05 '11 02:05

avgvstvs


People also ask

What is the time complexity of a regex?

Time Complexity of Regex is O(MxN).

What algorithm does regex use?

Most library implementations of regular expressions use a backtracking algorithm that can take an exponential amount of time on some inputs.

What is the best time complexity of an algorithm?

1. O(1) has the least complexity. Often called “constant time”, if you can create an algorithm to solve the problem in O(1), you are probably at your best.

How efficient is regex?

Regular Expressions are efficient in that one line of code can save you writing hundreds of lines. But they're normally slower (even pre-compiled) than thoughtful hand written code simply due to the overhead. Generally the simpler the objective the worse Regular Expressions are. They're better for complex operations.


2 Answers

This is one of the most popular outlines: Regular Expression Matching Can Be Simple And Fast . Running a DFA-compiled regular expression against a string is indeed O(n), but can require up to O(2^m) construction time/space (where m = regular expression size).

like image 142
porges Avatar answered Oct 08 '22 18:10

porges


Are you familiar with the term Deterministic/Non-Deterministic Finite Automata?

Real regular expressions (when I say real I'm refering to those regex that recognize Regular Languages, and not the regex that almost every programming language include with backreferences, etc) can be converted into a DFA/NFA and both can be implemented in a mechanical way in a programming language (a NFA can be converted into a DFA)

What you have to do is:

  1. Find a way to convert a regex into an automaton
  2. Implement the recognition of the automaton in the programming language of your preference

That way, given a regex you can convert it to a DFA and run it to see if it matches or not a specified text.

This can be implemented in O(n), because DFA don't go backward (like a Turing Machine), so it matches the string or not. That is supposing you won't take in count overlapped matches, otherwise you will have to go back and start matching again...

like image 27
Oscar Mederos Avatar answered Oct 08 '22 18:10

Oscar Mederos