Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variable Type for searching both with id and string

I need to store in memory about 500-1000 entries of 3 fields with quick and effective search by both int and str values. Searching happens in quick bursts of about 300-500 requests. I'm not sure how to efficiently do it.

Stored data consists of 3 fields:

  1. ID - Integer, which won't be sequential. I.e. it will be like (3, 5, 12, 55), but not (1, 2, 3, 4, 5).
  2. Name - String.
  3. Tags - String.

There are 3 possible scenarios:

  1. Get ID by Name.
  2. Get Name by ID.
  3. Get Tags by ID.

Currently, I use two different Types:

  1. THashedStringList with keypairs '%s=%i' to search by name.
  2. Array of Records sorted by ID for other searches.

I find this highly inefficient and currently looking for new ideas. Any hints?

like image 599
Alexander Avatar asked Jul 26 '20 17:07

Alexander


People also ask

What is the best data type for user ID?

A 64 bit int is plenty. Many databases won't have or won't use that integer type but will use a NUMBER type with specified scale and precision.


2 Answers

As David Heffernan suggested, you might want to use a proper database for this.

But if you want a more lightweight solution, with excellent performance, you can use an object list to store all your items and two dictionaries that refer to these items by their IDs and names, respectively.

As an example, consider a frog:

type
  TFrog = class
    ID: Integer;
    Name: string;
    Address: string;
  end;

Like your example, this class has one integer and two string members. We assume that every frog has a unique ID and a unique name. (But two or more frogs may share the same address.)

Just so we will be able to test the performance, we create a primitive frog generation function:

function CreateRandomFrog: TFrog;
const
  FrogFirstNames: array[0..11] of string =
    ('Luke', 'Smith', 'John', 'Maggie', 'Rose', 'Bill', 'Edward', 'Harry',
     'Andrew', 'Michael', 'Molly', 'Arthur');
  FrogLastNames: array[0..7] of string =
    ('Jones', 'Stone', 'Rock', 'Hill', 'Waterfall', 'Sky', 'Flower', 'Rain');
  FrogInitials: array[0..25] of Char = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  FrogAddressesPrefixes: array[0..3] of string =
    ('Tree', 'Swamp', 'Lawn', 'Lake');
begin
  Result := TFrog.Create;
  try
    Result.ID := Random(10*N);
    Result.Name := FrogFirstNames[Random(Length(FrogFirstNames))] + #32 +
      FrogInitials[Random(Length(FrogInitials))] + '.' +
      FrogInitials[Random(Length(FrogInitials))] + '.' +
      FrogInitials[Random(Length(FrogInitials))] + '.' + #32 +
      FrogLastNames[Random(Length(FrogLastNames))];
    Result.Address := FrogAddressesPrefixes[Random(Length(FrogAddressesPrefixes))] +
      #32 + Random(Byte.MaxValue).ToString;
  except
    Result.Free;
    raise;
  end;
end;

This will create frogs like

ID: 123
Name: Bill D.H.H. Rock
Address: Tree 52

We also define a constant

const
  N = 1000000;

This is the number of frogs we will create at the same time.

Now, some action: Define a class

type
  TFrogFarm = class
    Frogs: TObjectList<TFrog>;
    FrogsByID: TDictionary<Integer, TFrog>;
    FrogsByName: TDictionary<string, TFrog>;
    constructor Create;
    destructor Destroy; override;
    procedure TrySearchFarm;
  end;

The idea is that the Frogs list owns the frog objects, while the FrogsByID and FrogsByName dictionaries only refer to the frog objects without owning them. These are dictionaries using the IDs and the names as their keys.

Implement it like so:

{ TFrogFarm }

constructor TFrogFarm.Create;
var
  Frog: TFrog;
begin

  // Create the list that owns the frog objects
  Frogs := TObjectList<TFrog>.Create;

  // Create the dictionaries that refer to the frog objects without owning them
  FrogsByID := TDictionary<Integer, TFrog>.Create;
  FrogsByName := TDictionary<string, TFrog>.Create;

  // Create N random frogs with unique IDs and names
  repeat
    Frog := CreateRandomFrog;
    if not FrogsByID.ContainsKey(Frog.ID) and not FrogsByName.ContainsKey(Frog.Name) then
    begin
      Frogs.Add(Frog); // transfer of ownership
      FrogsByID.Add(Frog.ID, Frog);
      FrogsByName.Add(Frog.Name, Frog);
    end
    else
      Frog.Free; // if this weren't a simple test project, we'd protect this better
  until Frogs.Count = N;

end;

destructor TFrogFarm.Destroy;
begin
  FreeAndNil(FrogsByName);
  FreeAndNil(FrogsByID);
  FreeAndNil(Frogs);
  inherited;
end;

procedure TFrogFarm.TrySearchFarm;
var
  Frog: TFrog;
  S1, S2: string;
  c1, c2, f: Int64;
begin

  QueryPerformanceFrequency(f);
  QueryPerformanceCounter(c1);

  if FrogsByID.TryGetValue(100, Frog) then
    S1 := 'There is a frog with ID 100.'#13#10'He or she lives at ' + Frog.Address + '.'
  else
    S1 := 'There is NO frog with ID 100.';

  if FrogsByName.TryGetValue('Maggie A.M.D. Flower', Frog) then
    S2 := 'There is a frog named "Maggie A.M.D. Flower".'#13#10'She lives at ' + Frog.Address + '.'
  else
    S2 := 'There is NO frog named "Maggie A.M.D. Flower".';

  QueryPerformanceCounter(c2);

  ShowMessage(S1 + sLineBreak + sLineBreak + S2 + sLineBreak + sLineBreak +
    'Execution time: ' + Round(1000000*(c2 - c1)/f).ToString + ' µs');

end;

To try this, do

begin
  Randomize;
  while True do
    with TFrogFarm.Create do
      try
        TrySearchFarm;
      finally
        Free;
      end;
end;

Finding an element in a dictionary is an O(1) operation, so it is fast even in very large collections. And, indeed, with one million frogs in the farm (N = 1000000), lookup takes about 2 microseconds on my system:

Screenshot of the program: A message dialog with text "There is NO frog with ID 100. There is a frog named 'Maggie A.M.D. Flower'. She lives at Swamp 211. Execution time: 2 µs".

like image 200
Andreas Rejbrand Avatar answered Oct 24 '22 04:10

Andreas Rejbrand


I've put together this answer at Andreas Rejbrand's suggestion, as a counterpoint to his TDictionary-based answer. It is unlikely to be able to perform as well as that, but is simpler in some respects to set up.

It shows up the limitations of a TDataSet-based approach in a number of respects, the principal one of which is the need to have maximum field sizes for the strings fields. FireDAC supports ftWideString fields, but that doesn't mean that you should use them to store Delphi "huge" strings.

For searching, I've used the standard dataset Locate function, but if you were after optimisation, it would probably be better to have indexes set for the different types of search and select the right one at run-time.

i was uncertain of how you intend to use the Tags field. If you wanted to have an arbitrary number of tags per record, it would be better to put these in a dataset on the detail side of a master-detail relation with FDMemTable1. Left as an exercise for the reader.

procedure TForm2.FormCreate(Sender: TObject);
var
  AField : TField;
  i : Integer;
begin
  AField := TIntegerField.Create(Self);
  AField.FieldName := 'ID';
  AField.DataSet := FDMemTable1;

  AField := TStringField.Create(Self);
  AField.FieldName := 'Name';
  AField.Size := 80;
  AField.DataSet := FDMemTable1;

  AField := TStringField.Create(Self);
  AField.FieldName := 'Tags';
  AField.Size := 80;
  AField.DataSet := FDMemTable1;

  // FDMemTable1.IndexFieldNames := 'Name;ID';
  FDMemTable1.CreateDataSet;

  FDMemTable1.DisableControls;
  try
    for i := 1 to 1000 do
      FDMemTable1.InsertRecord([i, 'Frog' + IntToStr(i), Chr(Ord('A') + Random(26))]);
  finally
    FDMemTable1.EnableControls;
  end;
end;

function TForm2.FindByName(const AName : String) : Boolean;
begin
  Result := FDMemTable1.Locate('Name', AName, []);
end;

function TForm2.FindByID(const AID: Integer) : Boolean;
begin
  Result := FDMemTable1.Locate('ID', ID, []);
end;
like image 37
MartynA Avatar answered Oct 24 '22 05:10

MartynA