Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Give the k most frequent IP addresses from the large stream of IP address in constant time and constant space

I recently came across this interview question in my interview with a particular company. I tried to use maxHeap and tried to solve it, but it was not acceptable to him, as the question statement asked me to solve me in constant time and constant space.

So, I thought there is something related to bit magic, and bloom filters came to my mind, but as we know in bloom filters we can simply check if particular IP has been visited or not and it can also return false positives.

Can anyone please help me with this question. Interview is over, but I still I want to understand how can IPs be treated specially so that solution is O(c) in time and space.

like image 712
kinshuk4 Avatar asked May 10 '15 09:05

kinshuk4


People also ask

Which IP addresses are used more frequently?

IPv4 was the first version of IP. It was deployed for production in the ARPANET in 1983. Today it is the most widely used IP version. It is used to identify devices on a network using an addressing system.

Are IP addresses constant for any given device?

When a device is assigned a static IP address, the address does not change. Most devices use dynamic IP addresses, which are assigned by the network when they connect and change over time.

What are the 2 main reasons why an IP address is used?

IP addresses may identify your own computer, a favorite website, a network server, or even a device (such as a webcam). IP addresses are especially important for sending and receiving information. They route internet traffic where it needs to go, and they direct emails to your inbox.

Is the IP address the same on all devices?

Your internet service provider (ISP) assigns IP addresses to your internet-connected devices, and every IP address is unique. Considering every single internet-connected device has an IP address, billions of IP addresses exist.


1 Answers

This can be done in constant time for sure but not in constant space. You need space proportional to the number of unique IP addresses encountered so far.

This can be done using two data structures--

1.) A hashtable

2.) A doubly linked list(singly linked list can also be used but DLL will be more effective)

DLL will have at max 'k' nodes.

struct DLL{

  string IPAddress;
  int count;
  struct DLL* next;
  struct DLL* prev;
}

Key in the hashtable is the IP address. Value would be a tuple < count,address of Doubly Linked List Node>.

As soon as an IP address comes from the stream, it is checked in the hashtable in O(1),

if it's not present,    
   if number of nodes in the DLL is less than k, 
        add a new node at the start of the DLL for this IPAddress and a new
        entry is made in the hashtable <IPAddress,<count(=1 here),its    
        address in DLL>>
   else
        other IP Addresses in the DLL must be having a count of at least 1,
        so no point putting it in the DLL. Just add a new entry to the
        hashtable <IPAddress,<count(=1 here),null>
else
   update the tuple i.e increase the value of count by 1 and,
      if DLL node address is not NULL, 
        go to that node in the DLL, increase count there also and shift it 
        rightwards if at all count's value now is greater than
        next DLL node's count. Keep doing that till the concerned node
        reaches the right position. This is done in O(k). k is a constant as 
        per the problem statement. DLL really comes handy for these operations.
        We have to make sure that the DLL is always sorted in ascending order.
        Also, its very important to make the shifts by swapping links and not 
        by swapping values otherwise we end up updating the hash table for every
        which doesn't increase the time complexity but unnecessarily increases
        the number of operations
      else
        compare this IP Address's count with the count in the first node in DLL, if 
        its greater than the count of first node, create a new Node and insert 
        in the DLL appropriately in O(k). Update the hash table for this IP Address 
        and for the first IPAddress in the DLL before purging that node.

So, this way, k most frequent IP Addresses are always available in the DLL in constant time.

like image 162
discoverAnkit Avatar answered Oct 14 '22 19:10

discoverAnkit