Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Fastest Way to Check for a Keyword in a List of Keywords in Delphi?

I have a small list of keywords. What I'd really like to do is akin to:

case MyKeyword of
  'CHIL': (code for CHIL);
  'HUSB': (code for HUSB);
  'WIFE': (code for WIFE);
  'SEX': (code for SEX);
  else (code for everything else);
end;

Unfortunately the CASE statement can't be used like that for strings.

I could use the straight IF THEN ELSE IF construct, e.g.:

if MyKeyword = 'CHIL' then (code for CHIL)
else if MyKeyword = 'HUSB' then (code for HUSB)
else if MyKeyword = 'WIFE' then (code for WIFE)
else if MyKeyword = 'SEX' then (code for SEX)
else (code for everything else);

but I've heard this is relatively inefficient.

What I had been doing instead is:

P := pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX ');
case P of
  1: (code for CHIL);
  6: (code for HUSB);
  11: (code for WIFE);
  17: (code for SEX);
  else (code for everything else);
end;

This, of course is not the best programming style, but it works fine for me and up to now didn't make a difference.

So what is the best way to rewrite this in Delphi so that it is both simple, understandable but also fast?

(For reference, I am using Delphi 2009 with Unicode strings.)


Followup:

Toby recommended I simply use the If Then Else construct. Looking back at my examples that used a CASE statement, I can see how that is a viable answer. Unfortunately, my inclusion of the CASE inadvertently hid my real question.

I actually don't care which keyword it is. That is just a bonus if the particular method can identify it like the POS method can. What I need is to know whether or not the keyword is in the set of keywords.

So really I want to know if there is anything better than:

if pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX ') > 0 then

The If Then Else equivalent does not seem better in this case being:

if (MyKeyword = 'CHIL') or (MyKeyword = 'HUSB') or (MyKeyword = 'WIFE') 
      or (MyKeyword = 'SEX') then

In Barry's comment to Kornel's question, he mentions the TDictionary Generic. I've not yet picked up on the new Generic collections and it looks like I should delve into them. My question here would be whether they are built for efficiency and how would using TDictionary compare in looks and in speed to the above two lines?


In later profiling, I have found that the concatenation of strings as in: (' ' + MyKeyword + ' ') is VERY expensive time-wise and should be avoided whenever possible. Almost any other solution is better than doing this.

like image 518
lkessler Avatar asked Jan 23 '10 23:01

lkessler


3 Answers

type TKeyword = ( ECHIL, EHUSB, EWIFE, ESEX )
const TKeyNames : array[TKeyword] of string = ( 'CHIL', 'HUSB', 'WIFE', 'SEX' );

Key : TKeyword

case Key of 
  ECHIL : (code for CHIL);
  EHUSB : (code for HUSB);
  EWIFE : (code for WIFE);
  ESEX  : (code for SEX);
  else (code for everything else);
end;

In general, don't use strings as "keys", use enumerations -- they're safer, and you get much of a speed increase.

Unfortunately Delphi (as far as I know) has no standard hashtable implementation that'd be easy to use, you can always roll up your own however.

BTW, code for SEX sounds much more fun than "will code for beer" :P

like image 79
Kornel Kisielewicz Avatar answered Oct 21 '22 17:10

Kornel Kisielewicz


You can use a const table (which must be alpha-sorted) and a quick binary sort. It's very efficient, and doesn't require any hashing.

Here is the function to use:

function IsKeyWord(const KeyWords: array of string; const aToken: String): Boolean;
// aToken must be already uppercase
var First, Last, I, Compare: Integer;
begin
  First := Low(Keywords);
  Last := High(Keywords);
  Result := False;
  while First <= Last do
  begin
    I := (First + Last) shr 1;
    Compare := CompareStr(Keywords[i],aToken);
    if Compare = 0 then
      begin
        Result := True;
        break;
      end
    else
    if Compare < 0  then
      First := I + 1 else
      Last := I - 1;
  end;
end;

And here, some examples of keywords:

const
  PASCALKEYWORDS: array[0..100] of string =
  ('ABSOLUTE', 'ABSTRACT', 'AND', 'ARRAY', 'AS', 'ASM', 'ASSEMBLER',
   'AUTOMATED', 'BEGIN', 'CASE', 'CDECL', 'CLASS', 'CONST', 'CONSTRUCTOR',
   'DEFAULT', 'DESTRUCTOR', 'DISPID', 'DISPINTERFACE', 'DIV', 'DO',
   'DOWNTO', 'DYNAMIC', 'ELSE', 'END', 'EXCEPT', 'EXPORT', 'EXPORTS',
   'EXTERNAL', 'FAR', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FORWARD',
   'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INDEX', 'INHERITED',
   'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY',
   'MESSAGE', 'MOD', 'NAME', 'NEAR', 'NIL', 'NODEFAULT', 'NOT', 'OBJECT',
   'OF', 'OR', 'OUT', 'OVERRIDE', 'PACKED', 'PASCAL', 'PRIVATE', 'PROCEDURE',
   'PROGRAM', 'PROPERTY', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'RAISE',
   'READ', 'READONLY', 'RECORD', 'REGISTER', 'REINTRODUCE', 'REPEAT', 'RESIDENT',
   'RESOURCESTRING', 'SAFECALL', 'SET', 'SHL', 'SHR', 'STDCALL', 'STORED',
   'STRING', 'STRINGRESOURCE', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE',
   'UNIT', 'UNTIL', 'USES', 'VAR', 'VARIANT', 'VIRTUAL', 'WHILE', 'WITH', 'WRITE',
   'WRITEONLY', 'XOR');

  DFMKEYWORDS: array[0..4] of string = (
    'END', 'FALSE', 'ITEM', 'OBJECT', 'TRUE');

  CKEYWORDS: array[0..47] of string = (
  'ASM', 'AUTO', 'BREAK', 'CASE', 'CATCH', 'CHAR', 'CLASS', 'CONST', 'CONTINUE',
  'DEFAULT', 'DELETE', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EXTERN', 'FLOAT', 'FOR',
  'FRIEND', 'GOTO', 'IF', 'INLINE', 'INT', 'LONG', 'NEW', 'OPERATOR', 'PRIVATE',
  'PROTECTED', 'PUBLIC', 'REGISTER', 'RETURN', 'SHORT', 'SIGNED', 'SIZEOF',
  'STATIC', 'STRUCT', 'SWITCH', 'TEMPLATE', 'THIS', 'THROW', 'TRY', 'TYPEDEF',
  'UNION', 'UNSIGNED', 'VIRTUAL', 'VOID', 'VOLATILE', 'WHILE');

  CSHARPKEYWORDS : array[0..86] of string = (
  'ABSTRACT', 'AS', 'BASE', 'BOOL', 'BREAK', 'BY3', 'BYTE', 'CASE', 'CATCH', 'CHAR',
  'CHECKED', 'CLASS', 'CONST', 'CONTINUE', 'DECIMAL', 'DEFAULT', 'DELEGATE', 'DESCENDING',
  'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EVENT', 'EXPLICIT', 'EXTERN', 'FALSE', 'FINALLY',
  'FIXED', 'FLOAT', 'FOR', 'FOREACH', 'FROM', 'GOTO', 'GROUP', 'IF', 'IMPLICIT',
  'IN', 'INT', 'INTERFACE', 'INTERNAL', 'INTO', 'IS', 'LOCK', 'LONG', 'NAMESPACE',
  'NEW', 'NULL', 'OBJECT', 'OPERATOR', 'ORDERBY', 'OUT', 'OVERRIDE', 'PARAMS',
  'PRIVATE', 'PROTECTED', 'PUBLIC', 'READONLY', 'REF', 'RETURN', 'SBYTE',
  'SEALED', 'SELECT', 'SHORT', 'SIZEOF', 'STACKALLOC', 'STATIC', 'STRING',
  'STRUCT', 'SWITCH', 'THIS', 'THROW', 'TRUE', 'TRY', 'TYPEOF', 'UINT', 'ULONG',
  'UNCHECKED', 'UNSAFE', 'USHORT', 'USING', 'VAR', 'VIRTUAL', 'VOID', 'VOLATILE',
  'WHERE', 'WHILE', 'YIELD');

And it's very easy to use:

  if IsKeyWord(PASCALKEYWORDS,UpperCase('BEGIN')) then
   ....

You can easily modify the IsKeyWord() function to return the index of the token if you need it.

function KeyWordIndex(const KeyWords: array of string; const aToken: String): integer;
// aToken must be already uppercase
var First, Last, Compare: Integer;
begin
  First := Low(Keywords);
  Last := High(Keywords);
  while First <= Last do
  begin
    result := (First + Last) shr 1;
    Compare := CompareStr(Keywords[result],aToken);
    if Compare = 0 then
      exit else
    if Compare < 0  then
      First := result + 1 else
      Last := result - 1;
  end;
  result := -1; // not found
end;
like image 23
Arnaud Bouchez Avatar answered Oct 21 '22 16:10

Arnaud Bouchez


Mostly I use the IndexText function from StrUtils for that purpose. It is similar to your pos approach, but the return value is independent of the indiviual length of the strings. As a gimmick it is also case insensitive (use IndexStr if you don't want this).

function IndexText(const AText: string; const AValues: array of string): Integer; overload;

The comment to these functions actually mentions the case construct:

{ AnsiMatchText & AnsiIndexText provide case like function for dealing with strings }

like image 3
Uwe Raabe Avatar answered Oct 21 '22 16:10

Uwe Raabe