Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure + algorithm for ipv4 storage - efficient searching in prefixes

I am searching for Data Structure for IPV4. What should it store? Prefix: (base + mask) --> for example 85.23.0.0/16

base = 85.23.0.0 -> 32 bit unsigned

mask = 16 AKA 255.255.0.0 -> char 8 bit unsigned

So min host is 85.23.0.0 and max host is 85.23.255.255 (I know that it should be .0.1 and .255.254 in normal case but I want to simplify it)

The main thing that I require is speed of searching IP in stored prefixes. For example I give unsigned int (32bit) and I need to tell whether it is there or no.

I am writing in C++ so I can use STL

Now it is stored in STL set (pair base + mask) and I am searching one by one, so it is sort of O(n) (Excluding that is it probably BST tree, so walking through the it might be slower than O(n))

To sum up: I don't need efficient way to store IPV4, I need an efficient way to SEARCH it in some data structure. And the data structure won't store port or family type etc. It will store PREFIX (base + mask).

And the I am searching for data structure + some algorithm of searching.

like image 761
user3613919 Avatar asked Jun 25 '16 21:06

user3613919


People also ask

Which data structure you would use to save and quickly access IP addresses?

We can use btrees via which we can map the ip address in primary memory and then map them to secondary memory. As we know we need to store fairly large number of ip address we can't store them in primary memory as it is so,better if we would use a btree.

How a trie can be used to efficiently find a longest prefix match?

Using trie is one solution to find the longest match prefix. Trie is a data structure whose nodes have a part of the prefix. By the nature of the tree data structure, we can search the prefix efficiently by traversing the tree.

How do I store my IP?

We can store an IP address with the help of INT unsigned. While using INSERT, include INET_ATON() and with SELECT, include INET_NTOA(). IP address is in dotted format.


2 Answers

You may check the DXR paper 'Towards a Billion Routing Lookups per Second in Software' from 2012:

http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf

This paper presents DXR, a lookup scheme based on transforming large routing tables into compact lookup structures which easily fit into cache hierarchies of modern CPUs.

Our transform distills a real-world BGP snapshot with 417,000 IPv4 prefixes and 213 distinct next hops into a structure consuming only 782 Kbytes, less than 2 bytes per prefix. Experiments show that the corresponding lookup algorithm scales linearly with the number of CPU cores: running on a commodity 8-core CPU it yields average throughput of 840 million lookups per second for uniformly random IPv4 keys.

While you are asking "don't need efficient way to store IPV4", compact storage will help to performance of lookup, because memory is very slow and CPU cache is faster (variant of memory wall of computer memory hierarchy). More compact representation have less misses from L3 cache into memory and may be faster.

DXR has very good speed (on 3.6 GHz AMD FX-8150 with DDR3 1866MHz memory):

With random search keys (a worst-case situation) D18R goes from 117 million lookups per second (Mlps) on a single core, to 840 Mlps with all 8 cores active. ... Depending on the configuration and query pattern, DXR achieves between 50 and 250 million lookups per second on 1 core

117 mlps on 3.6 GHz is around 30 cpu ticks per lookup; and memory random access latency for DDR3 1866MHz is .. more than 9-10 ns or 32-36 cpu cycles only to get first bytes from memory to memory controller when the DRAM row is open (usually it may be not - opening row is slow too). Additional time is needed to read out full cache line and some more time to forward this cache line to L3, L2, L1, registers. Full memory latency may be close to 100 ns (360 cpu cycles)

like image 188
osgx Avatar answered Sep 30 '22 14:09

osgx


Why not use std:unordered_map? Should be between O(1) and O(n), or you can use std:map if you want a fixed performance of O(log n) (if you REALLY are interested in performance only in the searching phase). Unless your datasets are large you might find there is no significant difference.

like image 21
Robbykk Avatar answered Sep 30 '22 13:09

Robbykk