Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TDictionary Hashing is broken for records of strings

This reproduces the problem :

program Project1;

{$APPTYPE CONSOLE}

uses
  Generics.Collections;
type
  TStringRec = record
    s1 : string;
    s2 : string;
  end;
  TGetHash<TKey,TValue> = class(TEnumerable<TPair<TKey,TValue>>)
    public
    type
      TItem = record
        HashCode: Integer;
        Key: TKey;
        Value: TValue;
      end;
      TItemArray = array of TItem;
    public
    FItems: TItemArray;
  end;
var
  LCrossRef : TDictionary<TStringRec, integer>;
  LRec : TStringRec;
  i : integer;
begin
  LCrossRef := TDictionary<TStringRec, integer>.Create();
  LRec.s1 := 'test1';
  LRec.s2 := 'test2';
  LCrossRef.Add(LRec, 1);
  LRec.s1 := 'test1';
  LRec.s2 := 'test2';
  if LCrossRef.TryGetValue(LRec, i) then begin
    writeln('ok');
  end else begin
    LCrossRef.Add(LRec, 1);
    for i := Low(TGetHash<TStringRec, integer>
                (LCrossRef).FItems)
          to High(TGetHash<TStringRec, integer>
                (LCrossRef).FItems) do
      WriteLn(TGetHash<TStringRec, integer>(LCrossRef).FItems[i].HashCode);
    WriteLn('not ok');
  end;
  ReadLn;
end.

The dictionary fails to retrieve the item and generates a different HashCode for records containing identical strings.

This is partially noted in QC-#122791 but the workaround to use packed records does not work for records of strings (at least the above example also fails when TStringRec is declared as packed record).

Is there a sensible workaround to this?

My current strategy is to concatenate the strings that would have otherwise gone into the record and use a TDictionary<string, TValue> instead, but that is naturally unsatisfying.

like image 525
J... Avatar asked Oct 02 '14 12:10

J...


2 Answers

This is a known limitation, that is by design. The default comparers and hashers for records are only intended to work for pure value type records, and for records that have no padding.

The designers could have opted to use RTTI to compare/hash records. However, they opted not to do so. Some obvious plausible reasons for that choice are:

  1. They did not wish to force the use of RTTI on the unwillling.
  2. There is a significant performance hit incurred from the use of RTTI.

The way to deal with this is to supply your own comparers and hashers when using the generic collections.

Your current strategy of concatenating strings won't work. Consider 'a' and 'aa', and then 'aa' and 'a'. To use a text based approach you'd want to serialize the record to, say, JSON.

like image 51
David Heffernan Avatar answered Nov 15 '22 15:11

David Heffernan


To expand on David's answer using an example from my codebase. I have a dictionary

  Records: TDictionary<TGazetteerRecord,TGazetteerRecord>

which is instantiated

  Records := TDictionary<TGazetteerRecord,TGazetteerRecord>.Create(InitCapacity, TGazRecordComparer.Create);

What makes this work is having a custom comparer in the construction of the dictionary.

TGazRecordComparer = class(TEqualityComparer<TGazetteerRecord>)
private
public
  function Equals(const Left, Right: TGazetteerRecord): Boolean; override;
  function GetHashCode(const Value: TGazetteerRecord): Integer;  override;
end;

Th implementation for this then replaces the default code for a record type. My example actually uses a class rather than a record but I don't see why this shouldn't work perfectly fine with a record type. Note that the comparer class is reference counted and therefore will be automatically disposed of when the dictionary is destroyed.

like image 38
Kanitatlan Avatar answered Nov 15 '22 13:11

Kanitatlan