Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the default TArray.Sort comparator actually do and when would you use it?

In Delphi XE2's help for System.Generics.Collections.TArray.Sort, it says

Note: If the Comparer parameter is provided, it is used to compare elements; otherwise the default comparator for the array elements is used. 

I dug back a bit and found that the default comparator for TArray.Sort is _LookupVtableInfo from System.Generics.Defaults. The code for this is

function _LookupVtableInfo(intf: TDefaultGenericInterface; info: PTypeInfo; size: Integer): Pointer;
var
  pinfo: PVtableInfo;
begin
  if info <> nil then
  begin
    pinfo := @VtableInfo[intf, info^.Kind];
    Result := pinfo^.Data;
    if ifSelector in pinfo^.Flags then
      Result := TTypeInfoSelector(Result)(info, size);
    if ifVariableSize in pinfo^.Flags then
      Result := MakeInstance(Result, size);
  end
  else
  begin
    case intf of
      giComparer: Result := Comparer_Selector_Binary(info, size);
      giEqualityComparer: Result := EqualityComparer_Selector_Binary(info, size);
    else
      System.Error(reRangeError);
      Result := nil;
    end;
  end;
end;

It is called as

IComparer<T>(_LookupVtableInfo(giComparer, TypeInfo(T), SizeOf(T)))

I've looked through this quite a bit and I'm not really all that sure I know what it does. Does it just compare the bit in memory against each other or what exactly?

Second part of the question is a more generalized one of what situations would you be likely to actually want to use the default comparator, or is it unlikely that you'd ever actually want to use it?

like image 596
jep Avatar asked Jul 30 '13 15:07

jep


3 Answers

The default comparer provides implementations for many common types. Specifically it supports the following:

  • Integral types: Byte, Word, Integer etc.
  • Enumerated types.
  • Floating point types.
  • Strings.
  • Sets.
  • Class instances.
  • Procedural variables.
  • Methods.
  • Variants.
  • Static arrays.
  • Dynamic arrays.
  • Interfaces.
  • Pointers.
  • Records.

For many of these types the default implementation is exactly what you would expect. For example, for integers, enumerated types, floating point types the implementation uses the <, > and = operators. For string the default implementation calls CompareStr.

For other types, the default implementation is probably less useful. For example, for records, the comparison is a bytewise binary compare. It's highly likely that you'd want to supply your own implementation of a comparer for a record. One thing to watch out for with records is that the default comparer will compare any padding in your record, and you never want to do that. So it is never useful for an aligned record that has padding. And I'd also question the utility for records that contain reference types.

For dynamic arrays, the default implementation compares length first, and then, if length is equal, compares the binary content of the array. So, that might be reasonable for arrays of simple value types. But for multi-dimensional dynamic arrays, or arrays of reference types, not so much.

For class instances, methods, procedural variables, interfaces the default comparer treats the operands as a pointer (two pointers in the case of methods) and performs an address comparison.

When you you want to use the default comparer? Well, you'd use it whenever it matches your requirements for a comparer. So it certainly makes sense for simple value types. Beyond that you'd need to decide on a case by case basis.

like image 151
David Heffernan Avatar answered Oct 01 '22 14:10

David Heffernan


The function you have posted there isn't actually the comparision function, but rather a function that returns a comparison function, based on the TypeInfo and SizeOf T.

Following that deeper, we see in Generics.Defaults many functions of the form:

function Compare_ name of type (Inst: Pointer; const Left, Right: Type ): Integer;

which are all have the same body (but note left and right have different types)

begin
  if Left < Right then
    Result := -1
  else if Left > Right then
    Result := 1
  else
    Result := 0;
end;

and finally for everything left

function BinaryCompare(const Left, Right: Pointer; Size: Integer): Integer;
var
  pl, pr: PByte;
  len: Integer;
begin
  pl := Left;
  pr := Right;
  len := Size;
  while len > 0 do
  begin
    Result := pl^ - pr^;
    if Result <> 0 then
      Exit;
    Dec(len);
    Inc(pl);
    Inc(pr);
  end;
  Result := 0;
end;
like image 45
Caleth Avatar answered Oct 01 '22 15:10

Caleth


David did a great job of textually describing how the default comparers work, but for some of you it might be easier to follow when you see how the underlying code is structured (and decide if the default comparers do apply).

I will cover only the Compare_ style of comparisons. The Equals_ style works in a similar way.

What happens is that _LookupVtableInfo selects an IComparer interface for Compare_ style comparisons (and a IEqualityComparer for Equals_ style).

Underneath those interfaces are not ordinary interfaces, but interface wrappers around global functions of this form for Compare_ style:

function Compare_t<T>(Inst: Pointer; const Left, Right: T): Integer;

and global procedures of this form for Equals_ style:

function Equals_t<T>(Inst: Pointer; const Left, Right: T): Integer;
function GetHashCode_t<T>(Inst: Pointer; const Left, Right: T): Integer;

The outcome of Compare_ style functions is straightforward, but slightly different from -1, 0, +1 that some people might expect:

< 0 for Left < Right
= 0 for Left = Right
> 0 for Left > Right

For the majority of cases, the implementation is very simple:

I have grouped the Compare_ style functions by how they do this.

  • Ordinal types (including enumerators and Int64).
  • Floating point (Real) types (including Comp and Currency).
  • Short strings (from the Turbo Pascal / Delphi 1 days).
  • Wide strings (the OLE style ones).
  • Methods.
  • Pointers (including Classes, Interfaces, Class References and Procedures).

(Ordinal types outside the range of 1, 2, 4, 8 bytes, and real types outside the range of 4, 8, 10 bytes raise an error as they are illegal).

The first group just subtracts Left from Right: signed/unsigned integers of 1 or 2 bytes length

function Compare_I1(Inst: Pointer; const Left, Right: Shortint): Integer;
function Compare_I2(Inst: Pointer; const Left, Right: Smallint): Integer;
function Compare_U1(Inst: Pointer; const Left, Right: Byte): Integer;
function Compare_U2(Inst: Pointer; const Left, Right: Word): Integer;

  Result := Left - Right;

The second group does a comparison:

function Compare_I4(Inst: Pointer; const Left, Right: Integer): Integer;
function Compare_I8(Inst: Pointer; const Left, Right: Int64): Integer;
function Compare_U4(Inst: Pointer; const Left, Right: LongWord): Integer;
function Compare_U8(Inst: Pointer; const Left, Right: UInt64): Integer;
function Compare_R4(Inst: Pointer; const Left, Right: Single): Integer;
function Compare_R8(Inst: Pointer; const Left, Right: Double): Integer;
function Compare_R10(Inst: Pointer; const Left, Right: Extended): Integer;
function Compare_RI8(Inst: Pointer; const Left, Right: Comp): Integer;
function Compare_RC8(Inst: Pointer; const Left, Right: Currency): Integer;
function Compare_WString(Inst: PSimpleInstance; const Left, Right: WideString): Integer;
function Compare_Pointer(Inst: PSimpleInstance; Left, Right: NativeUInt): Integer;

type
{$IFNDEF NEXTGEN}
  TPS1 = string[1];
  TPS2 = string[2];
  TPS3 = string[3];
{$ELSE NEXTGEN}
  OpenString = type string;
  TPS1 = string;
  TPS2 = string;
  TPS3 = string;
{$ENDIF !NEXTGEN}

function Compare_PS1(Inst: PSimpleInstance; const Left, Right: TPS1): Integer;
function Compare_PS2(Inst: PSimpleInstance; const Left, Right: TPS2): Integer;
function Compare_PS3(Inst: PSimpleInstance; const Left, Right: TPS3): Integer;
// OpenString allows for any String[n], see http://my.safaribooksonline.com/book/programming/borland-delphi/1565926595/5dot-language-reference/ch05-openstring
function Compare_PSn(Inst: PSimpleInstance; const Left, Right: OpenString): Integer;

  if Left < Right then
    Result := -1
  else if Left > Right then
    Result := 1
  else
    Result := 0;

function Compare_Method(Inst: PSimpleInstance; const Left, Right: TMethodPointer): Integer;
var
  LMethod, RMethod: TMethod;
begin
  LMethod := TMethod(Left);
  RMethod := TMethod(Right);
  if LMethod < RMethod then
    Result := -1
  else if LMethod > RMethod then
    Result := 1
  else
    Result := 0;
end;

Now we get to the interesting bits: the not-so-straightforward outcomes.

Strings use CompareStr. If you want something different, you can use TOrdinalIStringComparer

function Compare_LString(Inst: PSimpleInstance; const Left, Right: AnsiString): Integer;
function Compare_UString(Inst: PSimpleInstance; const Left, Right: UnicodeString): Integer;

  Result := CompareStr(Left, Right);

BinaryCompare is used for:

  • binary data including unknown, Char/WChar, Set, Array, Record. Exception if the binary data is 1, 2 or 4 bytes size in x86 and x64 and 8 bytes in x64, it will be compared as integers.
  • dynamic carrays (be careful when they are multi-dimensional!).
  • variants as a last resort (see further below)

For records that can be compared, it makes sense to perform operator overloading, and have the comparer use those operators.

Binary data of 1, 2, 4 or 8 bytes is an exception, which will give strange results on little-endian machines (Intel x86 and x64, and bi-endian Arm in little-endian mode):

function Comparer_Selector_Binary(info: PTypeInfo; size: Integer): Pointer;
begin
  case size of
    // NOTE: Little-endianness may cause counterintuitive results,
    // but the results will at least be consistent.
    1: Result := @Comparer_Instance_U1;
    2: Result := @Comparer_Instance_U2;
    4: Result := @Comparer_Instance_U4;
    {$IFDEF CPUX64}
    // 64-bit will pass const args in registers
    8: Result := @Comparer_Instance_U8;
    {$ENDIF}
  else
    Result := MakeInstance(@Comparer_Vtable_Binary, size);
  end;
end;

The rest is pure binary:

function Compare_Binary(Inst: PSimpleInstance; const Left, Right): Integer;
begin
  Result := BinaryCompare(@Left, @Right, Inst^.Size);
end;

function Compare_DynArray(Inst: PSimpleInstance; Left, Right: Pointer): NativeInt;
var
  len, lenDiff: NativeInt;
begin
  len := DynLen(Left);
  lenDiff := len - DynLen(Right);
  if lenDiff < 0 then
    Inc(len, lenDiff);
  Result := BinaryCompare(Left, Right, Inst^.Size * len);
  if Result = 0 then
    Result := lenDiff;
end;

As usual, Variants are in a league of their own. First VarCompareValue is tried. If that fails, then Compare_UString is tried. If that fails too, BinaryCompare is tried. If that fails: tough luck.

function Compare_Variant(Inst: PSimpleInstance; Left, Right: Pointer): Integer;
var
  l, r: Variant;
  lAsString, rAsString: string;
begin
  Result := 0; // Avoid warning.
  l := PVariant(Left)^;
  r := PVariant(Right)^;
  try
    case VarCompareValue(l, r) of
      vrEqual:        Exit(0);
      vrLessThan:     Exit(-1);
      vrGreaterThan:  Exit(1);
      vrNotEqual:
      begin
        if VarIsEmpty(L) or VarIsNull(L) then
          Exit(1)
        else
          Exit(-1);
      end;
    end;
  except // if comparison failed with exception, compare as string.
    try
      lAsString := PVariant(Left)^;
      rAsString := PVariant(Right)^;
      Result := Compare_UString(nil, lAsString, rAsString);
    except  // if comparison fails again, compare bytes.
      Result := BinaryCompare(Left, Right, SizeOf(Variant));
    end;
  end;
end;
like image 31
Jeroen Wiert Pluimers Avatar answered Oct 01 '22 15:10

Jeroen Wiert Pluimers