Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing a Regex search and Replace on a std::string

I have a pattern '"XYZ\d\d' and a 'largish' string where this pattern can occur many times.

My objective is to find all instances of the pattern in the string and then to replace all the characters in that match with the letter 'A' in the original string.

I've so far got the following however there's an error:

#include <iostream>
#include <regex>

int main() {
    std::regex  exp("XYZ\\d\\d");
    std::smatch res;
    std::string str = " XYZ111 d-dxxxxxxx XYZ222 t-nyyyyyyyyy XYZ333 t-r ";

    auto itr = str.cbegin();

    while (std::regex_search(itr, str.cend(), res, exp)) {

        std::cout << "[" << res[0] << "]" << std::endl;

        for (auto j = res[0].first; j != res[0].second; ++j) {
           *j = 'A';  // Error as dereferencing j causes a const reference
        }

        itr += res.position() + res.length();
    }

    std::cout << std::endl;

    std::cout << "mod: " << str << std::endl;

    return 0;
}

I'm not sure what the correct process is when using C++11 regex facilities to accomplish my task.

Also was wondering is there something like regex_replace that takes a functor where one can specify how they'd like to change the match on every match occurring?

like image 736
Torrie Merk Avatar asked Feb 11 '17 05:02

Torrie Merk


2 Answers

Since you have the position and length you could use that to do the replacement, or if you just want to get rid of the error you can instantiate std::match_results with the non-const iterator (all the stdlib default instantiations use const).

#include <iostream>
#include <regex>

int main() {
    using strmatch = std::match_results<std::string::iterator>;

    std::regex  expr("XYZ\\d\\d");
    strmatch res;
    std::string str = " XYZ111 d-dxxxxxxx XYZ222 t-nyyyyyyyyy XYZ333 t-r ";

    auto itr = str.begin();

    while (std::regex_search(itr, str.end(), res, expr)) {

        std::cout << "[" << res[0] << "]" << std::endl;

        for (auto j = res[0].first; j != res[0].second; ++j) {
           *j = 'A';  // Error as dereferencing j causes a const reference
        }

        itr += res.position() + res.length();
    }

    std::cout << std::endl;

    std::cout << "mod: " << str << std::endl;

    return 0;
}
like image 168
PeterT Avatar answered Nov 16 '22 11:11

PeterT


You need a global regular expression based substitution. Here're three ways to do this without any explicit loops (sure there're "implicit" loops in regex replace codes):

#include <iostream>
#include <string>
#include <regex> // std::regex
#include <pcrecpp.h> // pcrecpp::RE -- needs "-lpcrecpp -lpcre"
#include <pcrscpp.h> // pcrscpp::replace -- needs "-lpcrscpp -lpcre"

int main() {
    std::regex std_rx (R"del(XYZ\d\d)del");
    pcrecpp::RE pcrecpp_rx (R"del(XYZ\d\d)del");
    pcrscpp::replace pcrscpp_rs(R"del(s/XYZ\d\d/A/g)del");
    std::string str = " XYZ111 d-dxxxxxxx XYZ222 t-nyyyyyyyyy XYZ333 t-r ";

    std::cout << "std::regex way: " << std::regex_replace (str, std_rx, "A") << std::endl
              << "pcrecpp way: ";

    std::string buffer(str);
    pcrecpp_rx.GlobalReplace("A", &buffer);

    std::cout << buffer << std::endl
              << "pcrscpp way: ";

    pcrscpp_rs.replace_store(str);
    std::cout << pcrscpp_rs.replace_result << std::endl;

    return 0;
}

Results:

std::regex way:  A1 d-dxxxxxxx A2 t-nyyyyyyyyy A3 t-r
pcrecpp way:  A1 d-dxxxxxxx A2 t-nyyyyyyyyy A3 t-r
pcrscpp way:  A1 d-dxxxxxxx A2 t-nyyyyyyyyy A3 t-r

std::regex needs C++11 features, and performs about twice slower than PCRE on simple patterns (see this answer), and I expect worse on more complicated ones, but doesn't require any additional libraries, as long as you use a C++11 compiler. PCRECPP is a PCRE C++ wrapper written by Google. PCRSCPP is my wrapper around PCRE that provides Perl-like regular expression based substitution capabilities, and hence is much more feature-rich than PCRECPP in this scope.

like image 5
Alex Potapenko Avatar answered Nov 16 '22 11:11

Alex Potapenko