Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex C++: extract substring

Tags:

c++

regex

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymbol and othersymbol

I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...

like image 715
eouti Avatar asked Jul 24 '12 08:07

eouti


2 Answers

Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:

#include <regex> #include <iostream>  int main() {     const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";     std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");     std::smatch match;      if (std::regex_search(s.begin(), s.end(), match, rgx))         std::cout << "match: " << match[1] << '\n'; } 

It will output:

 match: mysymbol 

It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.


By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.

like image 169
Some programmer dude Avatar answered Sep 28 '22 08:09

Some programmer dude


TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.

One possible solution:

[^_]*_([^_]*)_ 

will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.

But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _ in the string and extract the characters between those positions.

like image 41
Tim Pietzcker Avatar answered Sep 28 '22 08:09

Tim Pietzcker