I am trying to find out how many regex matches are in a string. I'm using an iterator to iterate the matches, and and integer to record how many there were.
long int before = GetTickCount();
string text;
boost::regex re("^(\\d{5})\\s(\\d{8})\\s(.*)\\s(.*)\\s(.*)\\s(\\d{8})\\s(.{1})$");
char * buffer;
long length;
long count;
ifstream f;
f.open("c:\\temp\\test.txt", ios::in | ios::ate);
length = f.tellg();
f.seekg(0, ios::beg);
buffer = new char[length];
f.read(buffer, length);
f.close();
text = buffer;
boost::sregex_token_iterator itr(text.begin(), text.end(), re, 0);
boost::sregex_token_iterator end;
count = 0;
for(; itr != end; ++itr)
{
count++;
}
long int after = GetTickCount();
cout << "Found " << count << " matches in " << (after-before) << " ms." << endl;
In my example, count always returns 1, even if I put code in the for loop to show the matches (and there are plenty). Why is that? What am I doing wrong?
TEST INPUT:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
OUTPUT (without matches):
Found 1 matches in 16 ms.
If I change the for loop to this:
count = 0;
for(; itr != end; ++itr)
{
string match(itr->first, itr->second);
cout << match << endl;
count++;
}
I get this as output:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
Found 1 matches in 47 ms.
Heh. Your problem is your regex. Change your (.\*)
s to (.\*?)
s (assuming that's supported). You think you're seeing each line being matched, but in fact you're seeing the entire text being matched because your pattern is greedy.
To see the issue illustrated, change the debug output in your loop to:
cout << "[" << match << "]" << endl;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With