Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ regular expression over a stream

I have a very large text file (up to a few hundred MB) that I would like to process with STL regular expression. The matching region I am looking for spans several lines and happens at least a few thousand times in the file.

Can I use stream iterators for that purpose? I've tried std::istream_iterator<char>, but no luck. Could one post a minimal working example?

Note, that I am looking for a solution involving only STL. In the perfect solution I would like to iterate over all matches.

EDIT

Once I've read the comment, I understand this is not possible. So maybe there is another way to iterate over regex matches to be found in a large text file:

#include <regex>
#include <iostream>
#include <string>

const std::string s = R"(Quick brown fox
jumps over
several lines)"; // At least 200MB of multiline text here

int main(int argc,char* argv[]) {

    std::regex find_jumping_fox("(Quick(?:.|\\n)+?jump\\S*?)");
    auto it = std::sregex_iterator(s.begin(), s.end(),        find_jumping_fox);

    for (std::sregex_iterator i = it; i != std::sregex_iterator(); ++i) {
        std::smatch match = *i;                                                 
        std::string match_str = match.str(); 
        std::cout << match_str << '\n';
    }  
}
like image 463
tnorgd Avatar asked Oct 22 '15 15:10

tnorgd


1 Answers

You can't match on a stream, cause what would a failed match mean? Has the start of the regex matched and more characters need to be streamed in, or has no part of the stream matched.

But after your edit, we can find offsets and ranges of matches on a string. You'll want to use:

const vector<smatch> foo = { sregex_iterator(cbegin(s), cend(s), find_jumping_fox), sregex_iterator() }

It's explained in complete detail here: https://topanswers.xyz/cplusplus?q=729#a845

like image 191
Jonathan Mee Avatar answered Oct 21 '22 03:10

Jonathan Mee