I have a large txt (ipaddress.txt) with a lot of lines, each line is a IP Address, eg.:
44.XXX.XXX.XXX
45.XXX.XXX.XXX
46.XXX.XXX.XXX
47.XXX.XXX.XXX
48.XXX.XXX.XXX
i load this list in a TStringList
, eg.:
FIpAddressList := TStringList.Create();
FIpAddressList.Sorted := true;
FIpAddressList.LoadFromFile('ipaddress.txt');
and i want check if a IP Address is in the list, eg.:
function IsIPinList(const IPAddress : string): Boolean;
begin
Result := (FIpAddressList.IndexOf(IPAddress) <> -1);
end;
it works... but is slow with huge TStringList
.
Is there any way to make this process faster?
UPDATE
One way to make this quicker is to arrange for the list to be sorted so that the search can be performed using binary search, O(log n), rather than linear search, O(n).
If the list is static then the best thing to do is to sort the list outside your program so that you can sort it exactly once.
If the list is more dynamic then you will have to sort it, and keep any modifications ordered. In this scenario, sorting the list will only bring benefits if the number of searches you make is sufficient to overcome the extra cost of sorting and maintaining the order.
Another approach is to use dictionary containing your items. Typically this will involve hashing each string. Whilst the lookup complexity is O(n) the cost of hashing can be significant.
Yet another way to speed things up is to convert the IP address strings to 32 bit integers. In fact this is sure to yield a huge performance benefit, and you should do this irrespective of any other concerns. It is always faster and clearer to work with a 32 bit integer than a string representation of an IP address.
These are just some possible approaches, there are more. Which to choose depends very much on the usage trade offs.
Whilst you probably are just looking for some code to solve your problem, it is my view that this problem is more algorithmic than code based. You need to better understand the requirements, data size constraints, etc. before selecting an algorithm. Once you have decided on an algorithm, or narrowed the choice down to a small number to compare, implementation should be straightforward.
Another possibility is that you have misdiagnosed your problem. Even linear search over a list of 5000 IP addresses stored as strings should be quicker than you are reporting:
So, the performance of your search could be improved easily by a factor of 20,000, I still don't think that your performance problems are down to what you believe. I wonder if your real problem that that you are reading the file from disk every time you search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With