We recently had an issue with a TDictionary<T>
instance that failed to lookup items correctly that are were already included in the dictionary. The problem occurred only in 64Bit builds. I was able to break the problem down to this code:
var
i1, i2: TPair<Int64,Integer>;
begin
FillMemory(@i1, sizeof(i1), $00);
FillMemory(@i2, sizeof(i1), $01);
i1.Key := 2;
i1.Value := -1;
i2.Key := i1.Key;
i2.Value := i1.Value;
Assert(TEqualityComparer<TPair<Int64,Integer>>.Default.Equals(i1, i2));
Assert(TEqualityComparer<TPair<Int64,Integer>>.Default.GetHashCode(i1) = TEqualityComparer<TPair<Int64,Integer>>.Default.GetHashCode(i2));
end;
The assertions fail in Win64 builds. The problem seems to occur due to the record alignment: The size of this TPair is 16 bytes, but only 12 bytes are filled with data. The TEqualityComparer
however takes all 16 bytes into account. So 2 record values may be treated as not equal, although all members are equal, just because of the different previous content of the memory
Can this be considered as a bug or behavior by design? It is a pitfall in any case. What is the best solution for such situations?
As workaround one may use NativeInt
instead of Integer
, however this Integer
type was not under our control.
I don't think this is a bug. The behaviour is by design. Without inspection or perhaps some compile time support for understanding these types, it is hard to write a general purpose comparer for arbitrary structured types.
The default record comparer can only safely be used on types with no padding and containing only certain simple value types that can be compared using naive binary comparison. For instance, floating point types are out because their comparison operators are more complex. Think of NaNs, negative zero, etc.
I think that the only robust way to deal with this is to write your own equality comparer. Others have suggested default initialising all record instances, but this places a significant burden on consumers of such types, and runs the risk of obscure and hard to track down defects in case some code forgets to default initialise.
I would use TEqualityComparer<T>.Construct
to create suitable equality comparers. This requires the least amount of boilerplate. You supply two anonymous methods: an equals function and a hash function, and Construct
returns you a newly minted comparer.
You can wrap this up in a generic class like so:
uses
System.Generics.Defaults,
System.Generics.Collections;
{$IFOPT Q+}
{$DEFINE OverflowChecksEnabled}
{$Q-}
{$ENDIF}
function CombinedHash(const Values: array of Integer): Integer;
var
Value: Integer;
begin
Result := 17;
for Value in Values do begin
Result := Result*37 + Value;
end;
end;
{$IFDEF OverflowChecksEnabled}
{$Q+}
{$ENDIF}
type
TPairComparer = class abstract
public
class function Construct<TKey, TValue>(
const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
const Hasher: THasher<TPair<TKey, TValue>>
): IEqualityComparer<TPair<TKey, TValue>>; overload; static;
class function Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>; overload; static;
end;
class function TPairComparer.Construct<TKey, TValue>(
const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
const Hasher: THasher<TPair<TKey, TValue>>
): IEqualityComparer<TPair<TKey, TValue>>;
begin
Result := TEqualityComparer<TPair<TKey, TValue>>.Construct(
EqualityComparison,
Hasher
);
end;
class function TPairComparer.Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>;
begin
Result := Construct<TKey, TValue>(
function(const Left, Right: TPair<TKey, TValue>): Boolean
begin
Result :=
TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
end,
function(const Value: TPair<TKey, TValue>): Integer
begin
Result := CombinedHash([
TEqualityComparer<TKey>.Default.GetHashCode(Value.Key),
TEqualityComparer<TValue>.Default.GetHashCode(Value.Value)
]);
end
)
end;
I've provided two overloads. If the default comparers for your two types are sufficient, then you can use the parameterless overload. Otherwise you can supply two anonymous methods bespoke to the types.
For your type, you would obtain a comparer like this:
TPairComparer.Construct<Int64, Integer>
Both of these simple types have default equality comparers that you can use. Hence the parameterless Construct
overload can be used.
The default comparer for records only works for records of pure value types without padding. Relying on it, generally, is not a good idea. For any records which require accurate hashing and equality comparison you really need to write your own comparers.
As has been noted, initializing all of your records with Default()
is also an option, but this approach is both tedious and error prone - it is easy to forget to initialize a record and it is difficult to trace such an omission when it happens. The approach is also only effective at eliminating errors related to padding while a custom comparer can also handle reference types, etc.
This, for example, demonstrates a working solution to the problem :
program Project1;
uses
SysUtils, Windows, StrUtils, Generics.Collections, Generics.Defaults,
System.Hash;
type
TPairComparer<TKey, TValue> = class(TEqualityComparer<TPair<TKey, TValue>>)
public
function Equals(const Left, Right: TPair<TKey, TValue>): Boolean; override;
function GetHashCode(const Value: TPair<TKey, TValue>): Integer; override;
end;
TInt64IntDict<TValue> = class(TDictionary<TPair<Int64, Integer>, TValue>)
public constructor Create;
end;
function TPairComparer<TKey, TValue>.Equals(const Left: TPair<TKey, TValue>;
const Right: TPair<TKey, TValue>) : boolean;
begin
result := TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
end;
{$IFOPT Q+}
{$DEFINE OVERFLOW_ON}
{$Q-}
{$ELSE}
{$UNDEF OVERFLOW_ON}
{$ENDIF}
function TPairComparer<TKey, TValue>.GetHashCode(const Value: TPair<TKey, TValue>) : integer;
begin
result := THashBobJenkins.GetHashValue(Value.Key, SizeOf(Value.Key), 23 * 31);
result := THashBobJenkins.GetHashValue(Value.Value, SizeOf(Value.Value), result * 31);
end;
{$IFDEF OVERFLOW_ON}
{$Q+}
{$UNDEF OVERFLOW_ON}
{$ENDIF}
constructor TInt64IntDict<TValue>.Create;
begin
inherited Create(0, TPairComparer<Int64, Integer>.Create);
end;
var
i1, i2: TPair<Int64, Integer>;
LI64c : TPairComparer<Int64, Integer>;
LDict : TInt64IntDict<double>;
begin
FillMemory(@i1, SizeOf(i1), $00);
FillMemory(@i2, SizeOf(i1), $01);
i1.Key := 2;
i1.Value := -1;
i2.Key := i1.Key;
i2.Value := i1.Value;
WriteLn(Format('i1 key = %d, i1 value = %d', [i1.Key, i1.Value]));
WriteLn(Format('i2 key = %d, i2 value = %d', [i2.Key, i2.Value]));
WriteLn; WriteLn('Using Default comparer');
if TEqualityComparer<TPair<Int64, Integer>>.Default.Equals(i1, i2) then
WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
if TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i1) =
TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i2) then
WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');
WriteLn; WriteLn('Using custom comparer');
LI64c := TPairComparer<Int64, Integer>.Create;
if LI64c.Equals(i1, i2) then
WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
if LI64c.GetHashCode(i1) = LI64c.GetHashCode(i2) then
WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');
WriteLn;
LDict := TInt64IntDict<double>.Create;
LDict.Add(i1, 1.23);
if LDict.ContainsKey(i2) then
WriteLn('Dictionary already contains key') else
WriteLn('Dictionary does not contain key');
ReadLn;
end.
This produces output
i1 key = 2, i1 value = -1
i2 key = 2, i2 value = -1Using Default comparer
i1 not equals i2
i1, i2 - hashes do not matchUsing custom comparer
i1 equals i2
i1, i2 - hashes matchDictionary already contains key
That said, as David's answer demonstrates, using a delegated comparer will produce less overhead and should be favoured in practice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With