Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi: search string in a TStringList

Tags:

delphi

I have a large txt (ipaddress.txt) with a lot of lines, each line is a IP Address, eg.:

44.XXX.XXX.XXX
45.XXX.XXX.XXX
46.XXX.XXX.XXX
47.XXX.XXX.XXX
48.XXX.XXX.XXX

i load this list in a TStringList, eg.:

FIpAddressList := TStringList.Create();
FIpAddressList.Sorted := true;
FIpAddressList.LoadFromFile('ipaddress.txt');

and i want check if a IP Address is in the list, eg.:

function IsIPinList(const IPAddress : string): Boolean;
begin
   Result := (FIpAddressList.IndexOf(IPAddress) <> -1);
end;

it works... but is slow with huge TStringList.

Is there any way to make this process faster?

UPDATE

  • The list is static with monthly update, with less or more 5'000 lines.
  • The function is invoked at each request on a server (eg. 5 times per seconds).
  • The list is loaded only when the server service start.
like image 798
ar099968 Avatar asked Dec 24 '22 22:12

ar099968


1 Answers

One way to make this quicker is to arrange for the list to be sorted so that the search can be performed using binary search, O(log n), rather than linear search, O(n).

If the list is static then the best thing to do is to sort the list outside your program so that you can sort it exactly once.

If the list is more dynamic then you will have to sort it, and keep any modifications ordered. In this scenario, sorting the list will only bring benefits if the number of searches you make is sufficient to overcome the extra cost of sorting and maintaining the order.

Another approach is to use dictionary containing your items. Typically this will involve hashing each string. Whilst the lookup complexity is O(n) the cost of hashing can be significant.

Yet another way to speed things up is to convert the IP address strings to 32 bit integers. In fact this is sure to yield a huge performance benefit, and you should do this irrespective of any other concerns. It is always faster and clearer to work with a 32 bit integer than a string representation of an IP address.

These are just some possible approaches, there are more. Which to choose depends very much on the usage trade offs.

Whilst you probably are just looking for some code to solve your problem, it is my view that this problem is more algorithmic than code based. You need to better understand the requirements, data size constraints, etc. before selecting an algorithm. Once you have decided on an algorithm, or narrowed the choice down to a small number to compare, implementation should be straightforward.


Another possibility is that you have misdiagnosed your problem. Even linear search over a list of 5000 IP addresses stored as strings should be quicker than you are reporting:

  • My computer can search such a list 2,000 times a second using linear search.
  • Once you sort the list and switch to binary search, then it can manage 1,000,000 searches a second.
  • Switch to storing an ordered array of integers, and it achieves over 10,000,000 searches a second.
  • With a hash based dictionary of integers gives performance twice as fast again.

So, the performance of your search could be improved easily by a factor of 20,000, I still don't think that your performance problems are down to what you believe. I wonder if your real problem that that you are reading the file from disk every time you search.

like image 84
David Heffernan Avatar answered Jan 12 '23 03:01

David Heffernan