Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TEqualityComparer<T> may fail for records due to alignment

Tags:

hash

delphi

We recently had an issue with a TDictionary<T> instance that failed to lookup items correctly that are were already included in the dictionary. The problem occurred only in 64Bit builds. I was able to break the problem down to this code:

var
  i1, i2: TPair<Int64,Integer>;
begin
  FillMemory(@i1, sizeof(i1), $00);
  FillMemory(@i2, sizeof(i1), $01);
  i1.Key := 2;
  i1.Value := -1;
  i2.Key := i1.Key;
  i2.Value := i1.Value;
  Assert(TEqualityComparer<TPair<Int64,Integer>>.Default.Equals(i1, i2));
  Assert(TEqualityComparer<TPair<Int64,Integer>>.Default.GetHashCode(i1) = TEqualityComparer<TPair<Int64,Integer>>.Default.GetHashCode(i2));
end;

The assertions fail in Win64 builds. The problem seems to occur due to the record alignment: The size of this TPair is 16 bytes, but only 12 bytes are filled with data. The TEqualityComparer however takes all 16 bytes into account. So 2 record values may be treated as not equal, although all members are equal, just because of the different previous content of the memory

Can this be considered as a bug or behavior by design? It is a pitfall in any case. What is the best solution for such situations?

As workaround one may use NativeInt instead of Integer, however this Integer type was not under our control.

like image 845
Joachim Marder Avatar asked Apr 05 '17 11:04

Joachim Marder


2 Answers

I don't think this is a bug. The behaviour is by design. Without inspection or perhaps some compile time support for understanding these types, it is hard to write a general purpose comparer for arbitrary structured types.

The default record comparer can only safely be used on types with no padding and containing only certain simple value types that can be compared using naive binary comparison. For instance, floating point types are out because their comparison operators are more complex. Think of NaNs, negative zero, etc.

I think that the only robust way to deal with this is to write your own equality comparer. Others have suggested default initialising all record instances, but this places a significant burden on consumers of such types, and runs the risk of obscure and hard to track down defects in case some code forgets to default initialise.

I would use TEqualityComparer<T>.Construct to create suitable equality comparers. This requires the least amount of boilerplate. You supply two anonymous methods: an equals function and a hash function, and Construct returns you a newly minted comparer.

You can wrap this up in a generic class like so:

uses
  System.Generics.Defaults,
  System.Generics.Collections;

{$IFOPT Q+}
  {$DEFINE OverflowChecksEnabled}
  {$Q-}
{$ENDIF}
function CombinedHash(const Values: array of Integer): Integer;
var
  Value: Integer;
begin
  Result := 17;
  for Value in Values do begin
    Result := Result*37 + Value;
  end;
end;
{$IFDEF OverflowChecksEnabled}
  {$Q+}
{$ENDIF}

type
  TPairComparer = class abstract
  public
    class function Construct<TKey, TValue>(
      const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
      const Hasher: THasher<TPair<TKey, TValue>>
    ): IEqualityComparer<TPair<TKey, TValue>>; overload; static;
    class function Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>; overload; static;
  end;


class function TPairComparer.Construct<TKey, TValue>(
  const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
  const Hasher: THasher<TPair<TKey, TValue>>
): IEqualityComparer<TPair<TKey, TValue>>;
begin
  Result := TEqualityComparer<TPair<TKey, TValue>>.Construct(
    EqualityComparison,
    Hasher
  );
end;

class function TPairComparer.Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>;
begin
  Result := Construct<TKey, TValue>(
    function(const Left, Right: TPair<TKey, TValue>): Boolean
    begin
      Result :=
        TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
        TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
    end,
    function(const Value: TPair<TKey, TValue>): Integer
    begin
      Result := CombinedHash([
        TEqualityComparer<TKey>.Default.GetHashCode(Value.Key),
        TEqualityComparer<TValue>.Default.GetHashCode(Value.Value)
      ]);
    end
  )
end;

I've provided two overloads. If the default comparers for your two types are sufficient, then you can use the parameterless overload. Otherwise you can supply two anonymous methods bespoke to the types.

For your type, you would obtain a comparer like this:

TPairComparer.Construct<Int64, Integer>

Both of these simple types have default equality comparers that you can use. Hence the parameterless Construct overload can be used.

like image 147
David Heffernan Avatar answered Nov 19 '22 03:11

David Heffernan


The default comparer for records only works for records of pure value types without padding. Relying on it, generally, is not a good idea. For any records which require accurate hashing and equality comparison you really need to write your own comparers.

As has been noted, initializing all of your records with Default() is also an option, but this approach is both tedious and error prone - it is easy to forget to initialize a record and it is difficult to trace such an omission when it happens. The approach is also only effective at eliminating errors related to padding while a custom comparer can also handle reference types, etc.

This, for example, demonstrates a working solution to the problem :

program Project1;

uses
  SysUtils, Windows, StrUtils, Generics.Collections, Generics.Defaults,
  System.Hash;

type
  TPairComparer<TKey, TValue> = class(TEqualityComparer<TPair<TKey, TValue>>)
    public
      function Equals(const Left, Right: TPair<TKey, TValue>): Boolean; override;
      function GetHashCode(const Value: TPair<TKey, TValue>): Integer;  override;
  end;
  TInt64IntDict<TValue> = class(TDictionary<TPair<Int64, Integer>, TValue>)
    public constructor Create;
  end;

function TPairComparer<TKey, TValue>.Equals(const Left: TPair<TKey, TValue>;
                                            const Right: TPair<TKey, TValue>) : boolean;
begin
  result := TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
            TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
end;

{$IFOPT Q+}
  {$DEFINE OVERFLOW_ON}
  {$Q-}
{$ELSE}
  {$UNDEF OVERFLOW_ON}
{$ENDIF}
function TPairComparer<TKey, TValue>.GetHashCode(const Value: TPair<TKey, TValue>) : integer;
begin
  result := THashBobJenkins.GetHashValue(Value.Key, SizeOf(Value.Key), 23 * 31);
  result := THashBobJenkins.GetHashValue(Value.Value, SizeOf(Value.Value), result * 31);
end;
{$IFDEF OVERFLOW_ON}
  {$Q+}
  {$UNDEF OVERFLOW_ON}
{$ENDIF}

constructor TInt64IntDict<TValue>.Create;
begin
  inherited Create(0, TPairComparer<Int64, Integer>.Create);
end;



var
  i1, i2: TPair<Int64, Integer>;
  LI64c : TPairComparer<Int64, Integer>;
  LDict : TInt64IntDict<double>;
begin
  FillMemory(@i1, SizeOf(i1), $00);
  FillMemory(@i2, SizeOf(i1), $01);
  i1.Key := 2;
  i1.Value := -1;
  i2.Key := i1.Key;
  i2.Value := i1.Value;
  WriteLn(Format('i1 key = %d, i1 value = %d', [i1.Key, i1.Value]));
  WriteLn(Format('i2 key = %d, i2 value = %d', [i2.Key, i2.Value]));

  WriteLn; WriteLn('Using Default comparer');
  if TEqualityComparer<TPair<Int64, Integer>>.Default.Equals(i1, i2) then
    WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
  if TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i1) =
     TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i2) then
       WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');

  WriteLn;  WriteLn('Using custom comparer');
  LI64c := TPairComparer<Int64, Integer>.Create;
  if LI64c.Equals(i1, i2) then
    WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
  if LI64c.GetHashCode(i1) = LI64c.GetHashCode(i2) then
       WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');
  WriteLn;
  LDict := TInt64IntDict<double>.Create;
  LDict.Add(i1, 1.23);
  if LDict.ContainsKey(i2) then
    WriteLn('Dictionary already contains key') else
      WriteLn('Dictionary does not contain key');
  ReadLn;
end.

This produces output

i1 key = 2, i1 value = -1
i2 key = 2, i2 value = -1

Using Default comparer
i1 not equals i2
i1, i2 - hashes do not match

Using custom comparer
i1 equals i2
i1, i2 - hashes match

Dictionary already contains key

That said, as David's answer demonstrates, using a delegated comparer will produce less overhead and should be favoured in practice.

like image 4
J... Avatar answered Nov 19 '22 03:11

J...