Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast file search algorithm for IP addresses

Question

What is the fastest way to find if an IP address exists in a file that contains IP addresses sorted as:

219.93.88.62
219.94.181.87
219.94.193.96
220.1.72.201
220.110.162.50
220.126.52.187
220.126.52.247

Constraints

  • No database (e.g., MySQL, PostgreSQL, Oracle, etc.)
  • Infrequent pre-processing is allowed (see possibilities section)
  • Would be nice not to have to load the file each query (131Kb)
  • Uses under 5 megabytes of disk space
  • No extra PHP modules

File Details

  • One IP address per line
  • 9500+ lines

Possible Solutions

  • Create a directory hierarchy (radix tree?) then use is_dir() (sadly, this uses 87 megabytes)
like image 753
Dave Jarvis Avatar asked Dec 29 '22 19:12

Dave Jarvis


2 Answers

Scanning the file line by line to find an IP seems like a pain if you have 9,000 non-matches to check before you get to 232.0.17.1

Is your file constrained to a single file? e.g. lets say this list is banned IPs and you just want to see if one is "in" the list.

What if you made a DIR to contain multiple files:

BannedIPs
  +- 0.ips
  +- 1.ips
  +- 37.ips
  +- 123.ips
  +- 253.ips
  +- 254.ips

Each file only contains IP addresses that start with that number.

If you were lucky enough to have even distribution... you'd have 256 files, but each would only have ~37 entries.

Thus when you want to test: 232.0.17.1 you look in the 232.ips file and scan for it.

like image 153
scunliffe Avatar answered Jan 14 '23 08:01

scunliffe


Since your file stores the IP addresses in sorted order already you can quickly find a specific IP address in O(log(n)) time by using a binary search.

If you want to speed this up even further you can cache for example every 100th row in memory and use an in-memory binary search first, then you know which part of the file you need to read in to finish the search.

Having said that 131kB is really not that much, so the simplest and fastest solution is to buy more memory and cache the entire file in memory in a hash table.

like image 36
Mark Byers Avatar answered Jan 14 '23 08:01

Mark Byers