Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ regex string capture

Tring to get C++ regex string capture to work. I have tried all four combinations of Windows vs. Linux, Boost vs. native C++ 0x11. The sample code is:

#include <string>
#include <iostream>
#include <boost/regex.hpp>
//#include <regex>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    smatch sm1;
    regex_search(string("abhelloworld.jpg"), sm1, regex("(.*)jpg"));
    cout << sm1[1] << endl;
    smatch sm2;
    regex_search(string("hell.g"), sm2, regex("(.*)g"));
    cout << sm2[1] << endl;
}

The closest that works is g++ (4.7) with Boost (1.51.0). There, the first cout outputs the expected abhelloworld. but nothing from the second cout.

g++ 4.7 with -std=gnu++11 and <regex> instead of <boost/regex.hpp> produces no output.

Visual Studio 2012 using native <regex> yields an exception regarding incompatible string iterators.

Visual Studio 2008 with Boost 1.51.0 and <boost/regex.hpp> yields an exception regarding "Standard C++ Libraries Invalid argument".

Are these bugs in C++ regex, or am I doing something wrong?

like image 265
Michael Malak Avatar asked Aug 25 '12 19:08

Michael Malak


People also ask

Can you use regex in C?

A regular expression is a sequence of characters used to match a pattern to a string. The expression can be used for searching text and validating input. Remember, a regular expression is not the property of a particular language. POSIX is a well-known library used for regular expressions in C.

What is Reg_extended?

REG_EXTENDED. Treat the pattern as an extended regular expression, rather than as a basic regular expression. REG_ICASE. Ignore case when matching letters.

What does regex match return?

The Match(String, String, RegexOptions) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

What is regex match group?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.


1 Answers

Are these bugs in C++ regex, or am I doing something wrong?

At the time of your posting, gcc didn't support <regex> as noted in the other answer (it does now). As for the other problems, your problem is you are passing temporary string objects. Change your code to the following:

smatch sm1;
string s1("abhelloworld.jpg");
regex_search(s1, sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;
smatch sm2;
string s2("hell.g");
regex_search(s2, sm2, regex("(.*)g"));
cout << sm2[1] << endl;

Your original example compiles because regex_search takes a const reference which temporary objects can bind to, however, smatch only stores iterators into your temporary object which no longer exists. The solution is to not pass temporaries.

If you look in the C++ standard at [§ 28.11.3/5], you will find the following:

Returns: The result of regex_search(s.begin(), s.end(), m, e, flags).

What this means is that internally, only iterators to your passed in string are used, so if you pass in a temporary, iterators to that temporary object will be used which are invalid and the actual temporary itself is not stored.

like image 185
Jesse Good Avatar answered Sep 30 '22 00:09

Jesse Good