I have a use case which I do not know how to solve. I am asking it here to get some guidance on what I need to learn to solve this.
1.1.1.2 or CIDR ranges like 1.1.1.1/24The requirement is following
- Save all the IP address which comes as above format in-memory
- Search should work as following
- if I have IP address as 1.1.1.2 and 1.1.1.1/24, it should match the specific IP address 1.1.1.2 and not the CIDR range it falls into (1.1.1.1/24)
- If specific IP address is not found but CIDR range is available, CIDR range is returned
- if no match is found, return null/throw exception
Question
- What data structure(s) can help me solve this problem? Tries?
- what approach you would take?
- How to make sure it does not consume too much memory and search is fast (some reasonable trade-off will be there)
Thanks
I would use a binary tree (this kind of trees are called Radix Trees):
Search: Simply navigate the tree with the address you have. If you end up in a leaf then you have that specific IP address otherwise the "lowest" node with a flag (see point 3) is the most specific CIDR range that you have.
Example: let's work with 8 bits IP and 2 bit "sections" addresses; after inserting 1.1.1.0/6 the tree will be (the number after the IP is the "contains" flag, nils are dummy leaves)
<root> -- nil
|
00.00.00.00/1 (0) -- nil
|
01.00.00.00/2 (0) -- nil
|
01.00.00.00/3 (0) -- nil
|
01.01.00.00/4 (0) -- nil
|
01.01.00.00/5 (0) -- nil
|
01.01.01.00/6 (1) --nil
|
nil
If you look for the IP address 1.1.1.1 you will stop at 1.1.1.1/6, which is a CIDR range because it has nil children and is the most specific (down in the tree) that there is. If you now insert 1.1.1.1 the tree will be
<root> -- nil
|
00.00.00.00/1 (0) -- nil
|
01.00.00.00/2 (0) -- nil
|
01.00.00.00/3 (0) -- nil
|
01.01.00.00/4 (0) -- nil
|
01.01.00.00/5 (0) -- nil
|
01.01.01.00/6 (1) -- nil
|
01.01.01.00/7 (0) -- nil
|
01.01.01.01 (1)
1.1.1.1 does not have leaves because it is an IP address. Finally let's insert 1.1.2.1
<root> -- nil
|
00.00.00.00/1 (0) -- nil
|
01.00.00.00/2 (0) -- nil
|
01.00.00.00/3 (0) -- nil
|
01.01.00.00/4 (0) --------------------+
| |
01.01.00.00/5 (0) -- nil 01.01.10.00/5 (0) -- nil
| |
01.01.01.00/6 (1) -- nil 01.01.10.00/6 (0) -- nil
| |
01.01.01.00/7 (0) -- nil 01.01.10.00/7 (0) -- nil
| |
01.01.01.01 (1) 01.01.10.01 (1)
First, you need to determine some decision points. This has to be completely deterministic. The only decision point you have stated is that it should be checked for a host address (/32) first, then look for a shorter mask.
1.1.1.1/24, where
the address is longer than allowed by the mask length? Do you assume
it to be a host-only address (1.1.1.1/32), or do you look for the subnet (1.1.1.0/24) where the above decision points come into play?You are going to need to get over memory consumption concerns where you need this for a few hundred million addresses with any reasonable speed.
I will assume that host addresses will the largest number of addresses you have, and, as the mask lengths become shorter, you will have progressively fewer addresses for each mask length.
If you use hash tables (one for each of the 32 mask lengths), you could start at the longest mask length (/32) table and, if the match isn't found, look at short mask length tables, depending on the deterministic rules above.
For instance, with 1.1.1.1/24 you have in your question, assume you have aggregates of 1.1.1.0/24, 1.1.0.0/23, and 1.1.0.0/22.
Table /32
1.1.1.1
Table /31
not found
Table /30
not found
Table /29
not found
Table /28
not found
Table /27
not found
Table /26
not found
Table /25
not found
Table /24
1.1.1.0
Table /23
1.1.0.0
Table /22
1.1.0.0
Table /21
not found
Table /20
not found
Table /19
not found
Table /18
not found
Table /17
not found
Table /16
not found
Table /15
not found
Table /14
not found
Table /13
not found
Table /12
not found
Table /11
not found
Table /10
not found
Table /9
not found
Table /8
not found
Table /7
not found
Table /6
not found
Table /5
not found
Table /4
not found
Table /3
not found
Table /2
not found
Table /1
not found
For example, with 1.1.1.1/24: You could start looking for the host address in the /32 table. If you don't find that, then AND the address with the mask to get 1.1.1.0 in the /24 table. If you don't find that, AND with the next shorter mask and look in that table. Repeat until you find a match or the mask length gets to 0. Exactly how you do this depends on the the algorithm resulting from your decisions above.
This only takes up one 32-bit memory block for each host address or aggregate, but, with a few hundred million addresses, it won't be a small amount. Performance will be highly dependent on the hashing algorithm used, but it should perform quite well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With