I recently came across this interview question in my interview with a particular company. I tried to use maxHeap and tried to solve it, but it was not acceptable to him, as the question statement asked me to solve me in constant time and constant space.
So, I thought there is something related to bit magic, and bloom filters came to my mind, but as we know in bloom filters we can simply check if particular IP has been visited or not and it can also return false positives.
Can anyone please help me with this question. Interview is over, but I still I want to understand how can IPs be treated specially so that solution is O(c) in time and space.
IPv4 was the first version of IP. It was deployed for production in the ARPANET in 1983. Today it is the most widely used IP version. It is used to identify devices on a network using an addressing system.
When a device is assigned a static IP address, the address does not change. Most devices use dynamic IP addresses, which are assigned by the network when they connect and change over time.
IP addresses may identify your own computer, a favorite website, a network server, or even a device (such as a webcam). IP addresses are especially important for sending and receiving information. They route internet traffic where it needs to go, and they direct emails to your inbox.
Your internet service provider (ISP) assigns IP addresses to your internet-connected devices, and every IP address is unique. Considering every single internet-connected device has an IP address, billions of IP addresses exist.
This can be done in constant time for sure but not in constant space. You need space proportional to the number of unique IP addresses encountered so far.
This can be done using two data structures--
1.) A hashtable
2.) A doubly linked list(singly linked list can also be used but DLL will be more effective)
DLL will have at max 'k' nodes.
struct DLL{
string IPAddress;
int count;
struct DLL* next;
struct DLL* prev;
}
Key in the hashtable is the IP address. Value would be a tuple < count,address of Doubly Linked List Node>.
As soon as an IP address comes from the stream, it is checked in the hashtable in O(1),
if it's not present,
if number of nodes in the DLL is less than k,
add a new node at the start of the DLL for this IPAddress and a new
entry is made in the hashtable <IPAddress,<count(=1 here),its
address in DLL>>
else
other IP Addresses in the DLL must be having a count of at least 1,
so no point putting it in the DLL. Just add a new entry to the
hashtable <IPAddress,<count(=1 here),null>
else
update the tuple i.e increase the value of count by 1 and,
if DLL node address is not NULL,
go to that node in the DLL, increase count there also and shift it
rightwards if at all count's value now is greater than
next DLL node's count. Keep doing that till the concerned node
reaches the right position. This is done in O(k). k is a constant as
per the problem statement. DLL really comes handy for these operations.
We have to make sure that the DLL is always sorted in ascending order.
Also, its very important to make the shifts by swapping links and not
by swapping values otherwise we end up updating the hash table for every
which doesn't increase the time complexity but unnecessarily increases
the number of operations
else
compare this IP Address's count with the count in the first node in DLL, if
its greater than the count of first node, create a new Node and insert
in the DLL appropriately in O(k). Update the hash table for this IP Address
and for the first IPAddress in the DLL before purging that node.
So, this way, k most frequent IP Addresses are always available in the DLL in constant time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With