Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the most popular likes in my friend network

I am working on a way to find the most popular likes in my friend network. "Most popular in my friend network" is defined as "having the most number of likes by my friends".

Suppose each friend has an unique id and has a number of liked pages. So, given an array of such friends, I want to find the likes which is liked by the most number of friends, and also the friends who liked this thing. Essentially, I want to show something like "Your friend X, Y & Z likes this."

My first solution is to use a Map (for storing the reverse mapping: like->set)and a Priority Queue (for finding the top N). Here is my algorithm (using C++ STL):

map< like, set<friend> > like2friendsMap;
for each friend {
  for each like {
    like2friendsMap[like].insert(friend); //populate the map
  }
}

priority_queue< pair<like, int> > pq;
for each like in like2friendsMap {
  int count = like2friendsMap[like].size(); //no. of friends who like this or "popularity"
  pq.push(like, count); //count is the priority
}

map< like, set<friend> > result
for i in 1 to N { //N is how many popular items I want
   result = pq.top();  //gives me the element with highest priority (most popular like)
   pq.pop();
}

As STL internally uses Red Black Tree for implementing map and min/max heap for priority queue, this approach seems pretty fast to me. But if I have 100s of friends and each have 100s of likes, the memory usage would be huge. Of course, instead of storing the whole objects, I should use friend id and like id for all calculations which would reduce memory usage a lot.

What other algorithm or data structure can be used to improve efficiency (increase speed, reduce memory)? For some reason, I can't store the list of friends against each like, it must be computed in run time. I am developing this using C++, so solutions using STL or boost would be even better.

like image 273
Sourav Avatar asked Nov 13 '22 22:11

Sourav


1 Answers

create an integer list allPages which can be referenced by page
initialize it with 0
for every friend f
{
    for every page p liked by f
    {
        allPages[p]++;
    }
}
get the maximum of the allPages[p]

If P is the number of pages, then it will have space complexity of O(P).

If F is the number of friends and L be the average pages liked by everyone. Then its time complexity would be O(F*L). So even if you traverse all the friends again to check who all liked the page, that won't add up much to the complexity.

O(F*L) + O(F) would remain O(F*L)

I think its better to iterate again rather than storing the friends.

Or you can store the reverse reference itself for the page. That is for every page, store the list of friends who liked. That won't take much space and will do you job in minimum complexity.

like image 156
Shashwat Avatar answered Nov 15 '22 13:11

Shashwat