Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when you compile regular expressions?

We all know that you can compile your much-used regular expressions into something that performs very good. But what is this witchcraft happening behind the curtains?

I guess that a finite state automaton gets built there, but you must know better than me.

like image 978
mike3996 Avatar asked Apr 20 '11 17:04

mike3996


People also ask

What does compiling a regex do?

In simple terms, We can compile a regular expression into a regex object to look for occurrences of the same pattern inside various target strings without rewriting it.

What is compile regular expression in Python?

compile(pattern, repl, string): We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.

Does compiling regex make it faster?

Regex has an interpreted mode and a compiled mode. The compiled mode takes longer to start, but is generally faster. Some users want both startup time and performance; other users want to run on a platform where JITted code is not allowed, and also want performance.

What will the \$ regular expression match?

\$ will help to find the character "$" available in the content based on the expression flags assigned to the regular expression. Say for example: \$: only find the single "$" in a content \$/g: find the "$" globally available in content.


1 Answers

The details of regular expression compilation vary by implementation. For example, compilation in Python or re2 simply creates an instance of a regular expression object. This object's state machine may be modeled as a graph or virtual machine. Without compilation (example: RE.match(expression, input)), a new regular expression object is created behind the scenes each time match is called. This is needless work if you're going to use an expression more than once.

In C#, one of three things can happen when you compile:

  1. A regular expression object is created (implemented as a virtual machine) similar to Python and re2.
  2. A regular expression object is created and its virtual machine opcodes are compiled to in-memory IL instructions on-the-fly.
  3. A regular expression object is created and its virtual machine opcodes are compiled to disk as IL instructions.

You mention an interest in algorithms. Take a look at Russ Cox's excellent articles for two approaches:

  • Regular Expression Matching Can Be Simple And Fast - describes a graph-based machine
  • Regular Expression Matching: the Virtual Machine Approach - describes a virtual machine implementation
like image 97
Corbin March Avatar answered Oct 08 '22 02:10

Corbin March