I am learning regex in c++ and java. So i did a performance test on c++11 regex and java regex with same expression and same no of inputs. Strangely java regex is faster than c++11 regex. Is there anything wrong in my code? Pls correct me
Java code:
import java.util.regex.*;
public class Main {
private final static int MAX = 1_000_000;
public static void main(String[] args) {
long start = System.currentTimeMillis();
Pattern p = Pattern.compile("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (int i = 0; i < MAX; i++) {
p.matcher("[email protected]").matches();
}
long end = System.currentTimeMillis();
System.out.print(end-start);
}
}
C++ code:
#include <iostream>
#include <Windows.h>
#include <regex>
using namespace std;
int main()
{
long long start = GetTickCount64();
regex pat("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (long i = 0; i < 1000000; i++) {
regex_match("[email protected]", pat);
}
long long end = GetTickCount64();
cout << end - start;
return 0;
}
Performance:
Java -> 1003ms
C++ -> 124360ms
Made the C++ sample portable:
#include <iostream>
#include <chrono>
#include <regex>
using C = std::chrono::high_resolution_clock;
using namespace std::chrono_literals;
int main()
{
auto start = C::now();
std::regex pat("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (long i = 0; i < 1000000; i++) {
regex_match("[email protected]", pat);
}
std::cout << (C::now() - start)/1.0ms;
}
On linux, and with clang++ -std=c++14 -march=native -O3 -o clang ./test.cpp I get 595.970 ms. See also Live On Wandbox
The java runs in 561 ms, on the same machine.
Update: Boost Regex runs much faster, see below comparative benchmark
Caveat: synthetic benchmarks like these are very prone to error: the compiler might sense that no observable side effects are done, and optimize the whole loop out, just to give an example.
Using Boost 1.67 and Nonius Micro-Benchmarking Framework

We can see that Boost's Regex implementations are considerably faster.
See the detailed sample data interactive online: https://plot.ly/~sehe/25/
Code Used
#include <iostream>
#include <regex>
#include <boost/regex.hpp>
#include <boost/xpressive/xpressive_static.hpp>
#define NONIUS_RUNNER
#include <nonius/benchmark.h++>
#include <nonius/main.h++>
template <typename Re>
void test(Re const& re) {
regex_match("[email protected]", re);
}
static const std::regex std_normal("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
static const std::regex std_optimized("^[\\w._]+@\\w+\\.[a-zA-Z]+$", std::regex::ECMAScript | std::regex::optimize);
static const boost::regex boost_normal("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
static const boost::regex boost_optimized("^[\\w._]+@\\w+\\.[a-zA-Z]+$", static_cast<boost::regex::flag_type>(boost::regex::ECMAScript | boost::regex::optimize));
static const auto boost_xpressive = []{
using namespace boost::xpressive;
return cregex { bos >> +(_w | '.' | '_') >> '@' >> +_w >> '.' >> +alpha >> eos };
}();
NONIUS_BENCHMARK("std_normal", [] { test(std_normal); })
NONIUS_BENCHMARK("std_optimized", [] { test(std_optimized); })
NONIUS_BENCHMARK("boost_normal", [] { test(boost_normal); })
NONIUS_BENCHMARK("boost_optimized", [] { test(boost_optimized); })
NONIUS_BENCHMARK("boost_xpressive", [] { test(boost_xpressive); })
Note Here's the output of the Hotspot JVM JIT compiler:
- http://stackoverflow-sehe.s3.amazonaws.com/fea76143-b712-4df9-97c3-4725b2f9e695/disasm.a.xz
This was generated using
LD_PRELOAD=/home/sehe/Projects/stackoverflow/fcml-1.1.3/example/hsdis/.libs/libhsdis-amd64.so ./jre1.8.0_171/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Main 2>&1 > disasm.a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With