I compared with Linux C regex library,
#include <iostream>
#include <chrono>
#include <regex.h>
int main()
{
const int count = 100000;
regex_t exp;
int rv = regcomp(&exp, R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", REG_EXTENDED);
if (rv != 0) {
std::cout << "regcomp failed with " << rv << std::endl;
}
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
regmatch_t match;
const char *sz = "http://www.abc.com";
if (regexec(&exp, sz, 1, &match, 0) == 0) {
// std::cout << sz << " matches characters " << match.rm_so << " - " << match.rm_eo << std::endl;
} else {
// std::cout << sz << " does not match" << std::endl;
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << elapsed.count() << std::endl;
return 0;
}
The result is roughly 60-70 milliseconds on my testing machine.
Then I used libc++'s library,
#include <iostream>
#include <chrono>
#include <regex>
int main()
{
const int count = 100000;
std::regex rgx(R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", std::regex_constants::extended);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
std::cmatch match;
const char sz[] = "http://www.abc.com";
if (regex_search(sz, match, rgx)) {
} else {
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_search: " << elapsed.count() << std::endl;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
const char sz[] = "http://www.abc.com";
if (regex_match(sz, rgx)) {
} else {
}
}
end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_match: " << elapsed.count() << std::endl;
return 0;
}
The result is roughly 2 seconds for both regex_search & regex_match. This is about 30 times slower than C's regex.h library.
Is there anything wrong with my comparison? Is C++'s regex library not suitable for high performance case?
I can understand it's slow because there's no optimization in c++'s regex library yet, but 30 times slower is just too much.
Thanks.
Hi all,
Thanks for answering.
Sorry for my mistake I was using [] for C too but later I changed, and forgot to change C++ code.
I made two changes,
This is still a bit slow, but not as much as the original comparison.
The current std::regex design and implementation are slow, mostly because the RE pattern is parsed and compiled at runtime. Users often don't need a runtime RE parser engine as the pattern is known during compilation in many common use cases.
Boost. Regex allows you to use regular expressions in C++. As the library is part of the standard library since C++11, you don't depend on Boost. Regex if your development environment supports C++11.
If you take a look at http://llvm.org/svn/llvm-project/libcxx/trunk/include/regex you'll see this implementation of regex_match
is layered atop regex_search
, and all overloads extract sub-expression match positions even if only into local temporaries that are thrown away. regex_search
uses a vector
of __state
objects that have .resize()
called on them so are presumably vectors too - all heap allocations and unnecessary when the subexpression matches aren't wanted, but would need to be tracked to support \1
etc in perl-style extensions to regular expressions: the old regcomp
/regexec
C functions didn't provide those extended features never have to do this extra work. Of course it would be nice if the clang implementation checked the regular expression's need for tracking matches during compilation and called leaner, faster functions to match when possible, but I guess they're just starting with support for the general case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With