I have a very large text file (up to a few hundred MB) that I would like to process with STL regular expression. The matching region I am looking for spans several lines and happens at least a few thousand times in the file.
Can I use stream iterators for that purpose? I've tried std::istream_iterator<char>
, but no luck. Could one post a minimal working example?
Note, that I am looking for a solution involving only STL. In the perfect solution I would like to iterate over all matches.
EDIT
Once I've read the comment, I understand this is not possible. So maybe there is another way to iterate over regex matches to be found in a large text file:
#include <regex>
#include <iostream>
#include <string>
const std::string s = R"(Quick brown fox
jumps over
several lines)"; // At least 200MB of multiline text here
int main(int argc,char* argv[]) {
std::regex find_jumping_fox("(Quick(?:.|\\n)+?jump\\S*?)");
auto it = std::sregex_iterator(s.begin(), s.end(), find_jumping_fox);
for (std::sregex_iterator i = it; i != std::sregex_iterator(); ++i) {
std::smatch match = *i;
std::string match_str = match.str();
std::cout << match_str << '\n';
}
}
You can't match on a stream, cause what would a failed match mean? Has the start of the regex matched and more characters need to be streamed in, or has no part of the stream matched.
But after your edit, we can find offsets and ranges of matches on a string. You'll want to use:
const vector<smatch> foo = { sregex_iterator(cbegin(s), cend(s), find_jumping_fox), sregex_iterator() }
It's explained in complete detail here: https://topanswers.xyz/cplusplus?q=729#a845
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With