Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching a fixed position in a string using C++ regex

Tags:

c++

regex

Do C++ regular expressions have a way to refer to a fixed position within a string? I am looking for a notation that is (fictitiously) shown in the following example as @n, where n is the index of the next character:

string = "hello12345";
re = "([a-z]+[0-9]+)(@7)(.+)";
// Match: ["hello12","345"]

Update: I cannot just repeat a fixed number of characters because I do not know how many of them will match [a-z] and [0-9].

like image 994
DYZ Avatar asked May 04 '26 08:05

DYZ


2 Answers

Don't do it all with a regular expression. Use std::string::substr() to split the input into two substrings, the first 7 characters and the rest. Then check that the first substring matches the regexp ^[a-z]+\d+$.

like image 148
Barmar Avatar answered May 05 '26 21:05

Barmar


Here's one idea combining both negative and positive lookbehinds:

^([a-z]+\d+)(?<!.{8})(?<=.{7})(.+)
             ^^^^^^^  ^^^^^^^
                a        b
  • a - Negative lookbehind for 8 characters
  • b - Positive lookbehind for 7 characters

Demo on regex101.

C++ example using boost::regex since it's capable of lookarounds and std::regex is currently not:

#include <boost/regex.hpp>
#include <iostream>
#include <vector>

int main() {
    boost::regex re(R"aw(^([a-z]+\d+)(?<!.{8})(?<=.{7})(.+))aw",
                    boost::regex_constants::ECMAScript);

    std::vector<std::string> strs{
        "hello12345",  //
        "a12345678",   //
        "ab12345678",  //
        "abc1234567",  //
        "abcd123456",  //
        "abcde12345",  //
        "abcdef1234",  //
        "abcdefg234",  //
        "a2cdef1234",  //
        "abcdefg1234", //
        "123456789"    //
    };

    std::cout << std::boolalpha;

    for(auto& test : strs) {
        boost::smatch what;
        bool got_match = boost::regex_match(test, what, re);
        std::cout << test << '\t' << got_match << '\n';
        if(got_match) {
            std::cout << " [\"" << what[1] << "\", \"" << what[2] << "\"]\n";
        }
    }
}

Output:

hello12345      true
 ["hello12", "345"]
a12345678       true
 ["a123456", "78"]
ab12345678      true
 ["ab12345", "678"]
abc1234567      true
 ["abc1234", "567"]
abcd123456      true
 ["abcd123", "456"]
abcde12345      true
 ["abcde12", "345"]
abcdef1234      true
 ["abcdef1", "234"]
abcdefg234      false
a2cdef1234      false
abcdefg1234     false
123456789       false

As noted by bobble bubble in a comment, this can even be simplified further by anchoring the positive lookbehind and skip the negative lookbehind:

^([a-z]+\d+)(?<=^.{7})(.+)
                ^
              anchor added

C++ demo

like image 31
Ted Lyngmo Avatar answered May 05 '26 20:05

Ted Lyngmo