I am working with std::regex
, and whilst reading about the various constants defined in std::regex_constants
, I came across std::optimize
, reading about it, it sounds like it is useful in my application (I only need one instance of the regex, initialized at the beginning, but it is used multiple times throughout the loading process).
According to the working paper n3126 (pg. 1077), std::regex_constants::optimize
:
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.
I was curious as to what type of optimization would be performed, but there doesn't seem to be much literature about it (indeed, it seems to be undefined), and one of the only things I found was at cppreference.com, which stated that std::regex_constants::optimize
:
Instructs the regular expression engine to make matching faster, with the potential cost of making construction slower. For example, this might mean converting a non-deterministic FSA to a deterministic FSA.
However, I have no formal background in computer science, and whilst I'm aware of the basics of what an FSA is, and understand the basic difference between a deterministic FSA (each state only has one possible next state), and a non-deterministic FSA (with multiple potential next states); I do not understand how this improves matching time. Also, I would be interested to know if there are any other optimizations in various C++ Standard Library implementations.
The Regular Expression is obviously all about string, it needs to play with string everywhere. std::regex plays a lot with std::string , that's why the performance of std::regex is not good. std::regex can't do much thing at compile time because it uses std::string , this could cause the performance issue too.
Good regular expressions are often longer than bad regular expressions because they make use of specific characters/character classes and have more structure. This causes good regular expressions to run faster as they predict their input more accurately.
There's some useful information on the topic of regex engines and performance trade offs (far more than can fit in a stackoverflow answer) in Mastering Regular Expressions by Jeffrey Friedl.
It's worth noting that Boost.Regex, which was the source for N3126, documents optimize
as "This currently has no effect for Boost.Regex."
P.S.
indeed, it seems to be implementation-defined
No, it's unspecified. Implementation-defined means an implementation is required to define the choice of behaviour. Implementations are not required to document how their regex engines are implemented or what (if any) difference the optimize
flag makes.
P.S. 2
in various STL implementations
std::regex
is not part of the STL, the C++ Standard Library is not the same thing as the STL.
See http://swtch.com/~rsc/regexp/regexp1.html for a nice explanation on how NFA based regex implementations can avoid the exponential backtracking that occurs in DFA matchers in certain circumstances.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With