Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a Hilbert map of the Internet

In the XKCD comic 195 a design for a map of the Internet address space is suggested using a Hilbert curve so that items from a similar IP adresses will be clustered together.

Given an IP address, how would I calculate its 2D coordinates (in the range zero to one) on such a map?

like image 715
Martin Avatar asked Dec 29 '09 20:12

Martin


2 Answers

This is pretty easy, since the Hilbert curve is a fractal, that is, it is recursive. It works by bisecting each square horizontally and vertically, dividing it into four pieces. So you take two bits of the IP address at a time, starting from the left, and use those to determine the quadrant, then continue, using the next two bits, with that quadrant instead of the whole square, and so on until you have exhausted all the bits in the address.

The basic shape of the curve in each square is horseshoe-like:

 0 3
 1 2

where the numbers correspond to the top two bits and therefore determine the traversal order. In the xkcd map, this square is the traversal order at the highest level. Possibly rotated and/or reflected, this shape is present at each 2x2 square.

Determination of how the "horseshoe" is oriented in each of the subsquares is determined by one rule: the 0 corner of the 0 square is in the corner of the larger square. Thus, the subsquare corresponding to 0 above must be traversed in the order

 0 1
 3 2

and, looking at the whole previous square and showing four bits, we get the following shape for the next division of the square:

 00 01 32 33
 03 02 31 30
 10 13 20 23
 11 12 21 22

This is how the square always gets divided at the next level. Now, to continue, just focus on the latter two bits, orient this more detailed shape according to how the horseshoe shape of those bits is oriented, and continue with a similar division.

To determine the actual coordinates, each two bits determine one bit of binary precision in the real number coordinates. So, on the first level, the first bit after the binary point (assuming coordinates in the [0,1] range) in the x coordinate is 0 if the first two bits of the address have the value 0 or 1, and 1 otherwise. Similarly, the first bit in the y coordinate is 0 if the first two bits have the value 1 or 2. To determine whether to add a 0 or 1 bit to the coordinates, you need to check the orientation of the horseshoe at that level.

EDIT: I started working out the algorithm and it turns out that it's not that hard after all, so here's some pseudo-C. It's pseudo because I use a b suffix for binary constants and treat integers as arrays of bits, but changing it to proper C shouldn't be too hard.

In the code, pos is a 3-bit integer for the orientation. The first two bits are the x and y coordinates of 0 in the square and the third bit indicates whether 1 has the same x coordinate as 0. The initial value of pos is 011b, meaning that the coordinates of 0 are (0, 1) and 1 has the same x coordinate as 0. ad is the address, treated as an n-element array of 2-bit integers, and starting from the most significant bits.

double x = 0.0, y = 0.0;
double xinc, yinc;
pos = 011b;
for (int i = 0; i < n; i++) {
    switch (ad[i]) {
        case 0: xinc = pos[0]; yinc = pos[1]; pos[2] = ~pos[2]; break;
        case 1: xinc = pos[0] ^ ~pos[2]; yinc = pos[1] ^ pos[2]; break;
        case 2: xinc = ~pos[0]; yinc = ~pos[1]; break;
        case 3: xinc = pos[0] ^ pos[2]; yinc = pos[1] ^ ~pos[2];
            pos = ~pos; break;
    }
    x += xinc / (1 << (i+1)); y += yinc / (1 << (i+1));
}

I tested it with a couple of 8-bit prefixes and it placed them correctly according to the xkcd map, so I'm somewhat confident the code is correct.

like image 173
JaakkoK Avatar answered Nov 12 '22 20:11

JaakkoK


Essentially you would decompose the number, using pairs of bits, MSB to LSB. The pair of bits tells you if the location is in the Upper Left (0) Lower Left (1) Lower Right (2) or Upper Right (3) quadrant, at a scale that gets finer as you shift through the number.

Additionally, you need to track an "orientation". This is the winding that is used at the scale you are at; the initial winding is as above (UL, LL, LR, UR), and depending on which quadrant you end up in, the winding at the next scale down is (rotated -90, 0, 0, +90) from your current winding.

So you could accumulate offsets :

suppose I start at 0,0, and the first pair gives me a 2, I shift offsets to 0.5, 0.5. The winding in the lower right is the same as my initial one. The next pair reduces the scale, so my adjustments are going to be 0.25 in length.

This pair is a 3, so I translate only my x coordinate and I am at .75, .5. The winding is now rotated over and my next scale down will be (LR, LL, UL, UR). The scale is now .125, and so on and so on until I run out of bits in my address.

like image 21
Mikeb Avatar answered Nov 12 '22 20:11

Mikeb