Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't `regex!` a wrapper for `Regex::new` to offer the same regex matching speed?

The Rust Regex crate offers the regex! syntax extension which makes it possible to compile a regex during the standard compile time. This is good in two ways:

  • we don't need to do that work during runtime (better program performance)
  • if our regex is malformed, the compiler can tell us during compilation instead of triggering a runtime panic

Unfortunately, the docs say:

WARNING: The regex! compiler plugin is orders of magnitude slower than the normal Regex::new(...) usage. You should not use the compiler plugin unless you have a very special reason for doing so.

This sounds like a completely different regex engine is used for regex! than for Regex::new(). Why isn't regex!() just a wrapper for Regex::new() to combine the advantages from both worlds? As I understand it, these syntax-extension compiler plugins can execute arbitrary code; why not Regex::new()?

like image 579
Lukas Kalbertodt Avatar asked Jan 06 '17 10:01

Lukas Kalbertodt


1 Answers

The answer is very subtle: one feature of the macro is that the result of regex! can be put into static data, like so:

static r: Regex = regex!("t?rust");

The main problem is that Regex::new() uses heap allocations during the regex compilation. This is problematic and would require a rewrite of the Regex::new() engine to also allow for static storage. You can also read burntsushi's comment about this issue on reddit.


There are some suggestions about how to improve regex!:

  • Drop the static support and just validate the regex string at compile time while still compiling the regex at runtime
  • Keep the static support by using a similar trick as lazy_static! does

As of the beginning of 2017, the developers are focused on stabilizing the standard API to release version 1.0. Since regex! requires a nightly compiler anyway, it has a low priority right now.

However, the compiler-plugin approach could offer even better performance than Regex::new(), which is already super fast: since the regex's DFA could be compiled into code instead of data, it has the potential to run a bit faster and benefit from compiler optimizations. But more research has to be done in the future to know for sure.

like image 162
Lukas Kalbertodt Avatar answered Nov 15 '22 18:11

Lukas Kalbertodt