Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Static regex object or does it matter?

Tags:

c++

regex

Suppose I have the following function:

bool IsNumber(std::string const& str)
{
    return std::regex_match(str, std::regex{"\\d+"});
}

I am constructing the std::regex each call. Is there documented performance overhead by doing this? Would it be better to make it static instead, like below?

bool IsNumber(std::string const& str)
{
    static std::regex const number_regex{"\\d+"};
    return std::regex_match(str, number_regex);
}

Or does it not really matter?

like image 433
void.pointer Avatar asked Aug 28 '15 13:08

void.pointer


1 Answers

The compiler might not be able to identify if the construction of std::regex is equal everytime it gets called (e.g. constructor could access a static/global variable). Thus the safe way would be to construct it in any case. On the other hand, compilers nowadays are very intelligent, maybe he parses the constructor deep enough to realize it must be constant over time, thus optimizes the thing out. In any case: profile it. e.g. make a loop and measure the time (std::chrono) for a few thousand calls (at least in order of seconds). –

I've made a very simple test programm to profile it:

#include <stdio.h>
#include <regex>
#include <chrono>

bool IsNumberA( std::string const& str )
{
    return std::regex_match( str, std::regex { "\\d+" } );
}

static std::regex number_regex( "\\d+" );
bool IsNumberB( std::string const& str )
{
    return std::regex_match( str, number_regex );
}

void main()
{
    size_t count = 100000;

    std::vector<std::string> aRandomStrings;

    for( size_t i = 0; i < count; i++ )
        aRandomStrings.push_back((rand() % 2 == 0) ? "nonumberatall" : "3141592");

    auto time = std::chrono::system_clock::now();

    size_t numberCountA = 0;
    for( size_t i = 0; i < count; i++ )
        if( IsNumberA( aRandomStrings[i] ) )
            numberCountA++;

    auto takenTimeA = std::chrono::duration_cast<std::chrono::milliseconds>
        (std::chrono::system_clock::now() - time);
    time = std::chrono::system_clock::now();    // reset

    size_t numberCountB = 0;
    for( size_t i = 0; i < count; i++ )
        if( IsNumberB( aRandomStrings[i] ) )
            numberCountB++;

    auto takenTimeB = std::chrono::duration_cast<std::chrono::milliseconds>
        (std::chrono::system_clock::now() - time);

    printf( "took %d ms for A, %d ms for B\n", takenTimeA.count(), takenTimeB.count() );
}

Results

I've compiled it without optimizations too, just to see if the compiler (msvc) is smart enough.

A 6283ms, B 41ms

Optimized: A 268ms, B 85ms

We can clearly see a massive boost in performance when using a predefined variable (B). The slower release in case B is not really clear to me, but the time scale is and might be too low. Also there might be a lot of unknown stuff in the random generator too.

like image 178
Zacharias Avatar answered Nov 07 '22 03:11

Zacharias