I need to extract integer values from the following text, between strings "start:" and "end:", and only between.
111222 garbage 999888 start: 123456 end: start: 654321 end:
wanted results:
123456
654321
Here is what I have, but I need it to exclude the unknown number of spaces around the integer.
std::regex
(?<=start:)(.*?)(?=end:)
RegExr
You may use
std::regex reg(R"(start:\s*(\d+)\s*end:)");
See the regex demo.
It defines the start:\s*(\d+)\s*end: regex pattern that matches start:, 0+ whitespaces, then captures into Group 1 one or more digits, and then matches 0+ whitespaces and end: substring.
Note that in case you cannot use raw string literals (R"(...)" notation), you may define the pattern with a regular string literal where all backslashes should be doubled: "start:\\s*(\\d+)\\s*end:".
To obtain all matches, you need std::sregex_token_iterator and when getting the matches, specify that you need to grab all Group 1 values:
const std::regex reg(R"(start:\s*(\d+)\s*end:)");
std::smatch match;
std::string s = "garbage 111222 garbage ... 999888 fewfew... start: 123456 end: start: 654321 end:";
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), reg, 1),
std::sregex_token_iterator());
See the online C++ demo
If there can be any value inside start: and end:, replace \d+ with .*? (matching any 0+ chars other than line break characters).
To extract the integer values between start: and end: without a lookbehind you could capture one or more digits in a capturing group:
start: followed by zero or more whitespace characters \s*(/d+) Capture one or more digits in a group (?=\s*end:) positive lookahead that asserts that what follows is zero or more whitespace characters and end:start:\s*(\d+)(?=\s*end:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With