It is common knowledge that SameStr(S1, S2)
is faster than S1 = S2
, where var S1, S2: string
in Delphi.
(And, of course, SameText(S1, S2)
is much faster than AnsiLowerCase(S1) = AnsiLowerCase(S2)
.)
But, as far as I understand it, SameStr(S1, S2)
does exactly the same thing as S1 = S2
, so I cannot help but wonder why in the world the Delphi compiler doesn't use the SameStr
code when it test for string equality using the =
operator. Surely there must be a reason for this?
A trivial program,
program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils,
RejbrandCommon;
const
N = 1000000;
var
Strings1, Strings2: StringArray;
i: integer;
b: {dummy }boolean;
procedure CreateRandomStringArrays;
var
i: integer;
begin
SetLength(Strings1, N);
SetLength(Strings2, N);
for i := 0 to N - 1 do
begin
Strings1[i] := RandomString(0, 40);
Strings2[i] := RandomString(0, 40);
end;
end;
begin
CreateRandomStringArrays;
StartClock;
for i := 0 to N - 1 do
if Strings1[i] = Strings2[i] then
b := not b;
StopClock;
OutputClock;
StartClock;
for i := 0 to N - 1 do
if SameStr(Strings1[i], Strings2[i]) then
b := not b;
StopClock;
OutputClock;
Pause;
end.
where
function RandomString(const LowerLimit: integer = 2; const UpperLimit: integer = 20): string;
var
N, i: integer;
begin
N := RandomRange(LowerLimit, UpperLimit);
SetLength(result, N);
for i := 1 to N do
result[i] := RandomChar;
end;
and the inlined
function RandomChar: char;
begin
result := chr(RandomRange(ord('A'), ord('Z')));
end;
and the "clock" functions just wrap QueryPerformanceCounter
, QueryPerformanceFrequency
, and Writeln
, produces the output
2.56599325762716E-0002
1.24310093156453E-0002
ratio ~ 2.06
If the difference in length of the two strings that we compare is large, then the difference is even bigger. We try
Strings1[i] := RandomString(0, 0); // = '';
Strings2[i] := RandomString(0, 40);
and obtain
1.81630411160156E-0002
4.44662043198641E-0003
ratio ~ 4.08
So why doesn't the compiler use the SameStr
code when writing assembly for S1 = S2
?
After reading Cosmin Prund's excellent answer, I couldn't resist setting
Strings1[i] := RandomString(40, 40);
Strings2[i] := RandomString(40, 40);
to produce strings of equal length and indeed.
2.74783364614126E-0002
1.96818773095322E-0002
ratio ~ 1.40
Hm... SameStr
still wins...
CPU Brand String: Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
Memory: 6 GB
OS: Windows 7 Home Premium (64-bit)
Compiler/RTL: Delphi 2009
It would seem (see the comments below Cosmin Prund's excellent answer) like the =
operator was changed between D2009 and D2010. Can anyone confirm this?
It all depends on how you're building the random strings. I used a modified version of the code, because very few of us have the RejbrandCommon unit, and because I wanted to use Excel to finish my analyses (and make pretty pictures).
Code (skip over the code to see some conclusions):
program Project3;
{$APPTYPE CONSOLE}
uses
SysUtils, Windows;
const
StringsNumber = 2000000;
var
Strings1, Strings2: array of string;
StrLen: integer;
b: {dummy }boolean;
function RandomString(MinLen, MaxLen:Integer):string;
var N, i:Integer;
begin
N := MinLen + Random(MaxLen-MinLen);
Assert(N >= MinLen); Assert(N <= MaxLen);
SetLength(Result, N);
for i:=1 to N do
Result[i] := Char(32 + Random(1024)); // Random Unicode Char
end;
procedure CreateRandomStringArrays(StrLen:Integer);
var
i: integer;
StrLen2:Integer;
begin
SetLength(Strings1, StringsNumber);
SetLength(Strings2, StringsNumber);
for i := 0 to StringsNumber - 1 do
begin
StrLen2 := StrLen + Random(StrLen div 2);
Strings1[i] := RandomString(StrLen, StrLen2);
StrLen2 := StrLen + Random(StrLen div 2);
Strings2[i] := RandomString(StrLen, StrLen2);
end;
end;
var C1, C2, C3, C4:Int64;
procedure RunTest(StrLen:Integer);
var i:Integer;
begin
CreateRandomStringArrays(StrLen);
// Test 1: using equality operator
QueryPerformanceCounter(C1);
for i := 0 to StringsNumber - 1 do
if Strings1[i] = Strings2[i] then
b := not b;
QueryPerformanceCounter(C2);
// Test 2: using SameStr
QueryPerformanceCounter(C3);
for i := 0 to StringsNumber - 1 do
if SameStr(Strings1[i], Strings2[i]) then
b := not b;
QueryPerformanceCounter(C4);
// Results:
C2 := C2 - C1;
C4 := C4 - C3;
WriteLn(IntToStr(StrLen) + #9 + IntToStr(C2) + #9 + IntToStr(C4));
end;
begin
WriteLn('Count'#9'='#9'SameStr');
for StrLen := 1 to 50 do
RunTest(StrLen);
end.
I made the CreateRandomStringArrays
routine take an StrLen parameter so I can run multiple similar tests in a loop. I made the code use QueryPerformanceCounter
directly and WriteLn
the results in tab-delimited format so I can copy/paste it into Excel. In Excel I get results in this form:
StrLen = SameStr 1 61527 69364 2 60188 69450 3 72130 68891 4 78847 85779 5 77852 78286 6 83612 88670 7 93936 96773
I then normalized things a bit. On each line made the maximum value "1" and the other value an percentage of 1. The result looks like this:
StrLen = SameStr 1 0,88 1 2 0,86 1 3 1 0,95 4 0,91 1 5 0,99 1 6 0,94 1 7 0,97 1
And then I started playing with the CreateRandomStringArrays
routine to run multiple tests.
This is how the plot looks like for the original case (CreateRandomStringArrays generates strings of random length, of length 1 to whatever's on the X axis). Blue is the result for the "=" operator, red is the result for the "SameStr", lower is better. It clear shows SameStr() has an edge for strings longer then 10 chars.
alt text http://fisiere.sediu.ro/PentruForumuri/V1_1_to_maxlen.png
Next test, made CreateRandomStringArrays
return strings of EQUAL length. The content of the strings is still fully-random, but the length of the strings is equal to whatever is on the X axis. This time the "=" operator is clearly more efficient:
alt text http://fisiere.sediu.ro/PentruForumuri/V1_equal_strings.png
Now the real question is, with REAL code, what's the probability of strings being equal? And how large has to be the difference for SameStr() to start gaining terrain? Next text, I'm building two strings, the first one's of StrLen (the number on the X axis), the second string has a length of StrLen + Random(4). Again, the "=" operator is better:
alt text http://fisiere.sediu.ro/PentruForumuri/V1_rnd_p4.png
Next test, I have two strings, each of length: StrLen + Random(StrLen div 10). The "=" operator is better.
alt text http://fisiere.sediu.ro/PentruForumuri/V1_rnd_pm_10p.png
... and my final test, strings of +/- 50% length. Formula: StrLen + Random(StrLen div 2). The SameStr()
wins this round:
alt text http://fisiere.sediu.ro/PentruForumuri/V1_rnd_pm_50p.png
I'm not sure. I didn't expect this to be linked to string length! I'd expect both functions to handle strings of different lengths lightning fast, but it doesn't happen.
SameStr has an optional third parameter: LocaleOptions. You get the behaviour similar to "=" by leaving out the third parameter: a case senstive locale independent comparison.
You would think that is the same as a binary comparison, but it isn't.
Since D2009 Delphi strings have an "code page" payload in addition to the length and the refcount.
StrRec = packed record
codePage: Word;
elemSize: Word;
refCnt: Longint;
length: Longint;
end;
When you do a String1 = String2
you are telling the compiler to ignore all the information about the string and simply do a binary comparison (it uses UStrEqual for that).
When you do a SameStr
or CompareStr
(which is used by SameStr) Delphi will first check the string for being Unicode (UTF-16LE) and if not, convert them before doing the actual work.
You can see this when you look at the implementation of CompareStr (the one without the third parameter) which, after initial optimisations, checks whether the arguments are unicode strings and if not, converts them using UStrFromLStr.
Update:
Actually, UStrEqual (by means of UStrCmp) also does conversions, like CompareStr it looks at the elemSize of the strings to decide whether they are Unicode or not and converts them if they aren't.
So the reason why the compiler does not use SameStr (CompareStr) for the =
operator eludes me at the moment. The only thing I can think of is that it has a nice analogy to the LStrEqual used to '='-compare AnsiStrings. I guess only the compiler people know.
Sorry to have wasted your time. I am leaving the answer though, so others won't have to go down this route of investigation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With