Question
What is the fastest way to find if an IP address exists in a file that contains IP addresses sorted as:
219.93.88.62 219.94.181.87 219.94.193.96 220.1.72.201 220.110.162.50 220.126.52.187 220.126.52.247
Constraints
File Details
Possible Solutions
is_dir()
(sadly, this uses 87 megabytes)Scanning the file line by line to find an IP seems like a pain if you have 9,000 non-matches to check before you get to 232.0.17.1
Is your file constrained to a single file? e.g. lets say this list is banned IPs and you just want to see if one is "in" the list.
What if you made a DIR to contain multiple files:
BannedIPs
+- 0.ips
+- 1.ips
+- 37.ips
+- 123.ips
+- 253.ips
+- 254.ips
Each file only contains IP addresses that start with that number.
If you were lucky enough to have even distribution... you'd have 256 files, but each would only have ~37 entries.
Thus when you want to test: 232.0.17.1
you look in the 232.ips
file and scan for it.
Since your file stores the IP addresses in sorted order already you can quickly find a specific IP address in O(log(n)) time by using a binary search.
If you want to speed this up even further you can cache for example every 100th row in memory and use an in-memory binary search first, then you know which part of the file you need to read in to finish the search.
Having said that 131kB is really not that much, so the simplest and fastest solution is to buy more memory and cache the entire file in memory in a hash table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With