I'm looking for a performance comparison between perl and boost regular expression. I need to design a piece of code which relies very heavily on regular expressions, and can choose between: <ol> <li>running it through a boost regex </li> <li>dispatching a perl interpreter and do the work in perl </li> </ol> I know perl is known for it's optimized string processing. However, I can't find a performance comparison to boost regex library. Do you know of any such comparison? Thanks

The startup cost of running a Perl interpreter from within your application (via the system function I presume) will outweigh any benefits you gain over using Perl's regex engine. The exception would be if you have a VERY complicated regular expression that Perl's regex implementation happens to be optimised for but boost's regex engine isn't. The real answer is that I do not know of any such comparison, but Perl's regular expression facilities are not necessarily the fastest. See here for some information about an algorithm that beats Perl's regular expression for some expressions. EDIT: It is possible to overcome the startup cost of starting a full perl interpreter by linking to libperl or using libPCRE. And using boost will probably give you more flexibility and performance tuning options if you need them. Final Note: There are no known direct comparisons between boost.regex and Perl's regex in terms of performance. The solution is to try both and see which is more performant for the OP's specific situation. (Edit : There is now a good comparison between Boost and PCRE. See http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/gcc-performance.html)

If you really need fast you can get a REGEX content coprocessor. There are two that I know of. Titanic makes a range of processors. Another is made by Cavium. And finally, LSI bought out a smaller company, and is shipping a line of regular expression matching processors. Theses systems can execute thousands of regular expressions in parallel, rather than one-at-a-time. The most expensive part of using them is moving memory to them and moving them back, and dealing with block-limits, etc. But if performance is a concern, you might want to try these out.

Regular expressions performance: Boost vs. Perl

6 Answers

The startup cost of running a Perl interpreter from within your application (via the system function I presume) will outweigh any benefits you gain over using Perl's regex engine. The exception would be if you have a VERY complicated regular expression that Perl's regex implementation happens to be optimised for but boost's regex engine isn't.

The real answer is that I do not know of any such comparison, but Perl's regular expression facilities are not necessarily the fastest. See here for some information about an algorithm that beats Perl's regular expression for some expressions.

EDIT: It is possible to overcome the startup cost of starting a full perl interpreter by linking to libperl or using libPCRE. And using boost will probably give you more flexibility and performance tuning options if you need them.

Final Note: There are no known direct comparisons between boost.regex and Perl's regex in terms of performance. The solution is to try both and see which is more performant for the OP's specific situation.

(Edit : There is now a good comparison between Boost and PCRE. See http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/gcc-performance.html)

139

answered Oct 02 '22 12:10

barkmadley

If you haven't seen it yet, there's a regexp benchmark in the Great Language Shootout. It doesn't rank Perl very high at all. A Boost implementation using boost::xpressive is ranked first (which pre-compiles the expression at compile time). However, this is a microbenchmark, so probably not representative of general regular expression speed, but still worth a look.

Surprisingly enough, apparently the fastest regular expression engine by far is Google Chrome's V8 JavaScript JIT (almost beats GCC in wall-clock time, utilizing just a single CPU core)

answered Oct 02 '22 13:10

intgr

If your regular expressions are fixed at compile time, you could also consider Boost.XPressive. It allows one to write regexes as expression templates that are parsed at compile time.

answered Oct 02 '22 14:10

Éric Malenfant

Start with the simplest solution. Decide how fast it needs to be for your application. Then measure the speed. If it's too slow, try the harder solution. Measure again. Repeat as necessary.

While my gut agrees with most of the other answers saying that starting the interpreter will be more expensive, you'll never know until you measure.

There's "fastest possible" and "fast enough for your application". Don't add complexity to get the former if you already have the latter.

answered Oct 02 '22 12:10

Adrian McCarthy

Unless your regex is insanely complex (for which perl's regex engine is incredibly fast by the way) then as other's have said, your overhead is in interpreter startup. On the other hand you could run a persistent perl that provides a regex server quite easily.

answered Oct 02 '22 12:10

singingfish

If you really need fast you can get a REGEX content coprocessor. There are two that I know of. Titanic makes a range of processors. Another is made by Cavium. And finally, LSI bought out a smaller company, and is shipping a line of regular expression matching processors.

Theses systems can execute thousands of regular expressions in parallel, rather than one-at-a-time. The most expensive part of using them is moving memory to them and moving them back, and dealing with block-limits, etc.

But if performance is a concern, you might want to try these out.

answered Oct 02 '22 13:10

Erik Aronesty

Related questions
                            
                                How to print a type vector<pair<char, int>> to screen c++?
                            
                                Visual Studio 2013 msvcr120 to msvcr100
                            
                                How to set output console width in Visual Studio
                            
                                Decrypting Chromium cookies
                            
                                What happens to the pointer itself after delete? [duplicate]
                            
                                c++ fast screenshots in linux for use with opencv
                            
                                How to extract a subvector (of an Eigen::Vector) from a vector of indices in Eigen?
                            
                                Can I put code outside of cases in a switch?
                            
                                Which boost libraries are discussed for inclusion in C++17?
                            
                                Range based for implicitly adds `const` qualifier?
                            
                                In S s = S() is it guaranteed that no temporary will be created?
                            
                                Produce std::tuple of same type in compile time given its length by a template argument
                            
                                How to deal with clang's (3.9) -Wexpansion-to-defined warning?
                            
                                Android NDK with Google Test
                            
                                What's the purpose of this lambda? [duplicate]
                            
                                Idiomatic C++ for reading from a const map
                            
                                Is there any way to determine how many characters will be written by sprintf?
                            
                                Numerical range iterators in boost?
                            
                                How can I split a string by a delimiter into an array?
                            
                                How can I use a C++ class from Perl?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expressions performance: Boost vs. Perl

Tags:

c++

performance

regex

perl

Oren S

People also ask

6 Answers

barkmadley

intgr

Éric Malenfant

Adrian McCarthy

singingfish

Erik Aronesty

Recent Activity

Donate For Us