Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CharInSet is much slower than IN, should I fix W1050 warning hint?

I use IN a lot in my project and I have lots of these warnings:

[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in set expressions. Consider using CharInSet function in SysUtils unit.

I made a quick test and using CharInSet instead of IN is from 65%-100% slower:

if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then

vs

if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then

Here is code for 2 tests, one works with loop through shorter strings, one loops once through a large string:

Adding 2 buttons on form I tested this for short string:

procedure TForm1.Button1Click(Sender: TObject);
var s1: string;
  t1, t2: TStopWatch;
  a, i, cnt, vMaxLoop: Integer;
begin
  s1 := '[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in set expressions.  Consider using CharInSet function in SysUtils unit.';
  vMaxLoop := 10000000;

  cnt := 0;
  t1 := TStopWatch.Create;
  t1.Start;
  for a := 1 to vMaxLoop do
    for i := 1 to Length(s1) do
      if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then
        inc(cnt);
  t1.Stop;

  cnt := 0;
  t2 := TStopWatch.Create;
  t2.Start;
  for a := 1 to vMaxLoop do
    for i := 1 to Length(s1) do
      if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then
        inc(cnt);
  t2.Stop;

  Button1.Caption := inttostr(t1.ElapsedMilliseconds) + ' - ' + inttostr(t2.ElapsedMilliseconds);
end;

And this for 1 long string:

procedure TForm1.Button2Click(Sender: TObject);
var s1: string;
  t1, t2: TStopWatch;
  a, i, cnt, vMaxLoop: Integer;
begin

  s1 := '[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in set expressions.  Consider using CharInSet function in SysUtils unit.';
  s1 := DupeString(s1, 1000000);
  s1 := s1 + s1 + s1 + s1; // DupeString is limited, use this to create longer string

  cnt := 0;
  t1 := TStopWatch.Create;
  t1.Start;
  for i := 1 to Length(s1) do
    if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then
      inc(cnt);
  t1.Stop;

  cnt := 0;
  t2 := TStopWatch.Create;
  t2.Start;
  for i := 1 to Length(s1) do
    if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then
      inc(cnt);
  t2.Stop;

  Button2.Caption := inttostr(t1.ElapsedMilliseconds) + ' - ' + inttostr(t2.ElapsedMilliseconds);
end;

Why do they recommend slower option, or how can I fix this warning without penalty in performance?

like image 928
Mike Torrettinni Avatar asked Mar 11 '16 14:03

Mike Torrettinni


1 Answers

The warning is telling you that your code may be defective. Because sets can only be based on types with ordinality of 256 or less, the base type is truncated to that size. Now, Char is an alias for WideChar and has ordinality 65536. So the warning is there to tell you that your program may not behave as you expect. For instance, one might ask what this expression evaluates to:

['A', chr(256)] = ['A']

One might expect it to evaluate false, but in fact it evaluates true. So I think you should certainly take heed of the compiler when it issues this warning.

Now, it so happens that your set, which can and should be written more concisely as ['A'..'Z'], is made up entirely of ASCII characters. And it happens (thanks to commentors Andreas and ventiseis) that in that case the compiler generates correct code for such a set, regardless of the ordinal value of the character to the left of the in operator. So

if s1[i] in ['A'..'Z'] then

will result in correct code, in spite of the warning. And the compiler is able to detect that the set's elements are contiguous and generate efficient code.

Note that this does depend on the set being a literal and so the optimisation can be performed by the compiler. And that is why it can perform so much better than CharInSet. Because CharInSet is a function, and the Delphi optimiser has limited power, CharInSet is not able to take advantage of the contiguous nature of this specific set literal.

The warning is annoying though, and do you really want to rely on remembering the very specific details of when this warning can safely be ignored. Another way to implement the test, and sidestep this warning is to use inequality operators:

if (c >= 'A') and (c <= 'Z') then
  ....

You'd probably wrap this in an inlined function to make the code even easier to read.

function IsUpperCaseEnglishLetter(c: Char): Boolean; inline;
begin
  Result := (c >= 'A') and (c <= 'Z');
end;

You should also ask yourself whether or not this code is a performance bottleneck. You should time your real program rather than such an artificial program. I'll bet that this code isn't a bottleneck and if so you should not treat performance as the key driver.

like image 135
David Heffernan Avatar answered Sep 20 '22 13:09

David Heffernan