I am in search of a data structure which enables me to quickly (prefarably O(1)-quickly) determine if a given GUID is a member of a Collection of GUIDs or not.
My current approach is to use a TDictionary with 0 as values.
While this works quickly, it seems to be a waste to use a Hashmap to rehash a GUID, which is by defintion considered to be unique, and to have the Dictionary handle values which are unneeded.
There must be a better solution for this, but I can't find one. Can you?
Very few data structures offer O(1) access. One's the Array, the other one's the HashMap (David's answer), and I only know one other: The Trie. Here follows a simple implementation of a bit-wise Trie: Has some interesting properties:
The code:
program Project23;
{$APPTYPE CONSOLE}
uses
SysUtils, Generics.Collections;
type
PGuidTrieNode=^TGuidTrieNode;
TGuidTrieNode = record
Sub:array[Boolean] of PGuidTrieNode;
end;
TGuidByteArray = array[0..15] of Byte;
TGuidTrie = class
protected
Root: PGuidTrieNode;
public
constructor Create;
destructor Destroy;override;
procedure Add(G: TGUID);
function Exists(G: TGUID): Boolean;
end;
{ TGuidTrie }
procedure TGuidTrie.Add(G: TGUID);
var GBA: TGuidByteArray absolute G;
Node: PGuidTrieNode;
i: Integer;
Bit: Integer;
IsBitSet: Boolean;
const BitMask: array[0..7] of Byte = (1, 2, 4, 8, 16, 32, 64, 128);
begin
Assert(SizeOf(G) = SizeOf(TGuidByteArray));
Node := Root;
for i:=0 to High(GBA) do
begin
for Bit := 0 to 7 do
begin
IsBitSet := (GBA[i] and BitMask[Bit]) <> 0;
if (i = High(GBA)) and (Bit = 7) then
begin
// Payload
Node.Sub[IsBitSet] := Pointer(1);
end
else
begin
if not Assigned(Node.Sub[IsBitSet]) then
Node.Sub[IsBitSet] := GetMemory(SizeOf(TGuidTrieNode));
Node := Node.Sub[IsBitSet];
end;
end;
end;
end;
constructor TGuidTrie.Create;
begin
Root := GetMemory(SizeOf(TGuidTrieNode))
end;
destructor TGuidTrie.Destroy;
procedure KillNode(Node: PGuidTrieNode);
var i:Integer;
begin
if Assigned(Node.Sub[True]) then
if Node.Sub[True] <> Pointer(1) then
begin
KillNode(Node.Sub[True]);
end;
FreeMemory(Node);
end;
begin
KillNode(Root);
inherited;
end;
function TGuidTrie.Exists(G: TGUID): Boolean;
var GBA: TGuidByteArray absolute G;
Node: PGuidTrieNode;
i: Integer;
Bit: Integer;
IsBitSet: Boolean;
const BitMask: array[0..7] of Byte = (1, 2, 4, 8, 16, 32, 64, 128);
begin
Assert(SizeOf(G) = SizeOf(TGuidByteArray));
Node := Root;
for i:=0 to 15 do
begin
for Bit := 0 to 7 do
begin
IsBitSet := (GBA[i] and BitMask[Bit]) <> 0;
if not Assigned(Node.Sub[IsBitSet]) then
begin
Result := False;
Exit;
end;
Node := Node.Sub[IsBitSet];
end;
end;
Result := True; // Node now contains the Payload
end;
const G1: TGUID = '{68D09F12-3E0D-4963-B32C-4EE3BD90F69C}';
G2: TGUID = '{BEED37F6-9757-41DC-8463-AF094392652B}';
var T: TGuidTrie;
begin
try
T := TGuidTrie.Create;
try
if T.Exists(G1) then WriteLn('Exists')
else WriteLn('NOT Exists');
T.Add(G1);
if T.Exists(G1) then WriteLn('Exists')
else WriteLn('NOT Exists');
if T.Exists(G2) then WriteLn('Exists')
else WriteLn('NOT Exists');
T.Add(G2);
if T.Exists(G2) then WriteLn('Exists')
else WriteLn('NOT Exists');
finally T.Free;
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
I think you are 99% of the way there.
Hashing sounds like the right solution. The obvious way to take advantage of the special nature of the GUID is to supply your own hash function which combines into a single 32 bit integer the 4 32 bit integers that make up a GUID. I'd just XOR the 4 integers.
I presume you are using Generics.Collections.TDictionary. You can supply your own hash function by passing a custom comparer to the constructor. I wouldn't worry about storing spare values, I don't think it will affect performance in a discernible way.
I trust that you are storing your GUIDs as 128 bit integers and not as strings.
Finally, it has occurred to me that the default comparer for a GUID might indeed already do the hash code generation this way. It's worth checking that out before making any changes.
EDIT
Default hash code uses Bob Jenkins hash applied to the binary data. An XOR would be faster, but the default hash code doesn't seem like it would be a performance bottleneck.
In other words, I think that TDictionary<TGUID,Integer>
will serve your needs perfectly adequately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With