Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my Boost.Regex search report only one match iteration?

I am trying to find out how many regex matches are in a string. I'm using an iterator to iterate the matches, and and integer to record how many there were.

long int before = GetTickCount();
string text;

boost::regex re("^(\\d{5})\\s(\\d{8})\\s(.*)\\s(.*)\\s(.*)\\s(\\d{8})\\s(.{1})$");
char * buffer;
long length;
long count;
ifstream f;


f.open("c:\\temp\\test.txt", ios::in | ios::ate);
length = f.tellg();
f.seekg(0, ios::beg);

buffer = new char[length];

f.read(buffer, length);
f.close();

text = buffer;
boost::sregex_token_iterator itr(text.begin(), text.end(), re, 0);
boost::sregex_token_iterator end;

count = 0;
for(; itr != end; ++itr)
{
    count++;
}

long int after = GetTickCount();
cout << "Found " << count << " matches in " << (after-before) << " ms." << endl;

In my example, count always returns 1, even if I put code in the for loop to show the matches (and there are plenty). Why is that? What am I doing wrong?

Edit

TEST INPUT:

12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N

OUTPUT (without matches):

Found 1 matches in 16 ms.

If I change the for loop to this:

count = 0;
for(; itr != end; ++itr)
{
    string match(itr->first, itr->second);
    cout << match << endl;
    count++;
}

I get this as output:

12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
Found 1 matches in 47 ms.
like image 825
scottm Avatar asked Dec 18 '22 08:12

scottm


1 Answers

Heh. Your problem is your regex. Change your (.\*)s to (.\*?)s (assuming that's supported). You think you're seeing each line being matched, but in fact you're seeing the entire text being matched because your pattern is greedy.

To see the issue illustrated, change the debug output in your loop to:

cout << "[" << match << "]" << endl;
like image 180
chaos Avatar answered Mar 02 '23 18:03

chaos