Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a flat file of IP Ranges and mappings, find a city given an IP

This is the question:

Given a flat text file that contains a range of IP addresses that map to a location (e.g. 192.168.0.0-192.168.0.255 = Boston, MA), come up with an algorithm that will find a city for a specific ip address if a mapping exists.

My only idea is parse the file, and turn the IP ranges into just ints (multiplying by 10/100 if it's missing digits) and place them in a list, while also putting the lower of the ranges into a hash as the key with the location as a value. Sort the list and perform a slightly modified binary search. If the index is odd, -1 and look in the hash. If it's even, just look in the hash.

Any faults in my plans, or better solutions?

like image 355
Joe legrae Avatar asked Mar 06 '12 03:03

Joe legrae


2 Answers

Your approach seems perfectly reasonable.

If you are interested in doing a bit of research / extra coding, there are algorithms that will asymptotically outperform the standard binary search technique that rely on the fact that your IP addresses can be interpreted as integers in the range from 0 to 231 - 1. For example, the van Emde Boas tree and y-Fast Trie data structures can implement the predecessor search operation that you're looking at in time O(log log U), where U is the maximum possible IP address, as opposed to the O(log N) approach that binary search uses. The constant factors are higher, though, which means that there is no guarantee that this approach will be any faster. However, it might be worth exploring as another approach that could potentially be even faster.

Hope this helps!

like image 91
templatetypedef Avatar answered Oct 19 '22 03:10

templatetypedef


The problem smells of ranges, and one of the good data-structures for this problem would be a Segment Tree. Some resources to help you get started.

The root of the segment tree can represent the addresses (0.0.0.0 - 255.255.255.255). The left sub-tree would represent the addresses (0.0.0.0 - 127.255.255.255) and the right sub-tree would represent the range (128.0.0.0 - 255.255.255.255), and so on. This will go on till we reach ranges which cannot be further sub-divided. Say, if we have the range 32.0.0.0 - 63.255.255.255, mapped to some arbitrary city, it will be a leaf node, we will not further subdivide that range when we arrive there, and tag it to the specific city.

To search for a specific mapping, we follow the tree, just as we do in a Binary Search Tree. If your IP lies in the range of the left sub-tree, move to the left sub-tree, else move to the right sub-tree.

The good parts:

  1. You need not have all sub-trees, only add the sub-trees which are required. For example, if in your data, there is no city mapped for the range (0.0.0.0 - 127.255.255.255), we will not construct that sub-tree.
  2. We are space efficient. If the entire range is mapped to one city, we will create only the root node!
  3. This is a dynamic data-structure. You can add more cities, split-up ranges later on, etc.
  4. You will be making constant number of operations, since the maximum depth of the tree would be 4 x log2(256) = 32. For this particular problem it turns out that Segment Trees would be as fast as van-Emde Boas trees, and require lesser space (O(N)).
  5. This is a simple, but non-trivial data-structure, which is better than sorting, because it is dynamic, and easier to explain to your interviewer than van-Emde Boas trees.
  6. This is one of the easiest non-trivial data-structures to code :)

Please note that in some Segment Tree tutorials, they use arrays to represent the tree. This is probably not what you want, since we would not be populating the entire tree, so dynamically allocating nodes, just like we do in a standard Binary Tree is the best.

like image 42
reddragon Avatar answered Oct 19 '22 04:10

reddragon