I have built a match pattern in RegexBuddy which behaves exactly as I expect. But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx.
My real world code have 6 capture group but I can illustrate the problem in an easier example. This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog.
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['time'].Value);
end;
But if I use only one capture group
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');
The first dialog shows "2" and the second dialog will show the time "00:00" as expected.
However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime".
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['atime'].Value);
end;
I'll get "3" and "00:00", just as expected. Is there reserved words I cannot use? I don't think so because in my real example I've tried completely random names. I just cannot figure out what causes this behaviour.
When pcre_get_stringnumber does not find the name, PCRE_ERROR_NOSUBSTRING
is returned.
PCRE_ERROR_NOSUBSTRING
is defined in RegularExpressionsAPI as PCRE_ERROR_NOSUBSTRING = -7
.
Some testing shows that pcre_get_stringnumber
returns PCRE_ERROR_NOSUBSTRING
for every name that has the first letter in the range of k
to z
and that range is dependent of the first letter in judge
. Changing judge
to something else changes the range.
As i see it there is at lest two bugs involved here. One in pcre_get_stringnumber
and one in TGroupCollection.GetItem that needs to raise a proper exception instead of SRegExIndexOutOfBounds
The bug seems to be in the RegularExpressionsAPI
unit that wraps the PCRE library, or in the PCRE OBJ files that it links. If I run this code:
program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils, RegularExpressionsAPI;
var
myregexp: Pointer;
Error: PAnsiChar;
ErrorOffset: Integer;
Offsets: array[0..300] of Integer;
OffsetCount, Group: Integer;
begin
try
myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
if (myregexp <> nil) then begin
offsetcount := pcre_exec(myregexp, nil, '00:00 X1 90 55KENNY BENNY', Length('00:00 X1 90 55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
if (offsetcount > 0) then begin
Group := pcre_get_stringnumber(myregexp, 'time');
WriteLn(Group);
Group := pcre_get_stringnumber(myregexp, 'judge');
WriteLn(Group);
end;
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn;
end.
It prints -7 and 2 instead of 1 and 2.
If I remove RegularExpressionsAPI from the uses
clause and add the pcre
unit from my TPerlRegEx component, then it does correctly print 1 and 2.
The RegularExpressionsAPI
in Delphi XE is based on my pcre
unit, and the RegularExpressionsCore
unit is based on my PerlRegEx
unit. Embarcadero did make some changes to both units. They also compiled their own OBJ files from the PCRE library that are linked by RegularExpressionsAPI
.
I have reported this bug as QC 92497
I have also created a separate report QC 92498 to request that TGroupCollection.GetItem
raise a more sensible exception when requesting a named group that does not exist. (This code is in the RegularExpressions
unit which is based on code written by Vincent Parrett, not myself.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With