I am searching for Data Structure for IPV4. What should it store? Prefix: (base + mask) --> for example 85.23.0.0/16
base = 85.23.0.0 -> 32 bit unsigned
mask = 16 AKA 255.255.0.0 -> char 8 bit unsigned
So min host is 85.23.0.0 and max host is 85.23.255.255 (I know that it should be .0.1 and .255.254 in normal case but I want to simplify it)
The main thing that I require is speed of searching IP in stored prefixes. For example I give unsigned int (32bit) and I need to tell whether it is there or no.
I am writing in C++ so I can use STL
Now it is stored in STL set (pair base + mask) and I am searching one by one, so it is sort of O(n) (Excluding that is it probably BST tree, so walking through the it might be slower than O(n))
To sum up: I don't need efficient way to store IPV4, I need an efficient way to SEARCH it in some data structure. And the data structure won't store port or family type etc. It will store PREFIX (base + mask).
And the I am searching for data structure + some algorithm of searching.
We can use btrees via which we can map the ip address in primary memory and then map them to secondary memory. As we know we need to store fairly large number of ip address we can't store them in primary memory as it is so,better if we would use a btree.
Using trie is one solution to find the longest match prefix. Trie is a data structure whose nodes have a part of the prefix. By the nature of the tree data structure, we can search the prefix efficiently by traversing the tree.
We can store an IP address with the help of INT unsigned. While using INSERT, include INET_ATON() and with SELECT, include INET_NTOA(). IP address is in dotted format.
You may check the DXR paper 'Towards a Billion Routing Lookups per Second in Software' from 2012:
http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf
This paper presents DXR, a lookup scheme based on transforming large routing tables into compact lookup structures which easily fit into cache hierarchies of modern CPUs.
Our transform distills a real-world BGP snapshot with 417,000 IPv4 prefixes and 213 distinct next hops into a structure consuming only 782 Kbytes, less than 2 bytes per prefix. Experiments show that the corresponding lookup algorithm scales linearly with the number of CPU cores: running on a commodity 8-core CPU it yields average throughput of 840 million lookups per second for uniformly random IPv4 keys.
While you are asking "don't need efficient way to store IPV4", compact storage will help to performance of lookup, because memory is very slow and CPU cache is faster (variant of memory wall of computer memory hierarchy). More compact representation have less misses from L3 cache into memory and may be faster.
DXR has very good speed (on 3.6 GHz AMD FX-8150 with DDR3 1866MHz memory):
With random search keys (a worst-case situation) D18R goes from 117 million lookups per second (Mlps) on a single core, to 840 Mlps with all 8 cores active. ... Depending on the configuration and query pattern, DXR achieves between 50 and 250 million lookups per second on 1 core
117 mlps on 3.6 GHz is around 30 cpu ticks per lookup; and memory random access latency for DDR3 1866MHz is .. more than 9-10 ns or 32-36 cpu cycles only to get first bytes from memory to memory controller when the DRAM row is open (usually it may be not - opening row is slow too). Additional time is needed to read out full cache line and some more time to forward this cache line to L3, L2, L1, registers. Full memory latency may be close to 100 ns (360 cpu cycles)
Why not use std:unordered_map? Should be between O(1) and O(n), or you can use std:map if you want a fixed performance of O(log n) (if you REALLY are interested in performance only in the searching phase). Unless your datasets are large you might find there is no significant difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With