Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to maintain lists of 'currently most popular' items for each item category, in a web application?

I need to maintain lists of 40 recent added, most popular/ most liked items for each item category(total categories around 2000) in my application. I do store the views count & no Of likes for each item. For that purpose, I'm looking to probably maintain an in-memory structure on app server, to store & retrieve these list of items.

Do you have any ideas about how to implement this in-memory data structure, & importantly, keeping in mind the associated memory footprint & minimizing it to the farthest extent)?


Using:

Java 1.6

like image 723
Rajat Gupta Avatar asked Apr 03 '12 19:04

Rajat Gupta


3 Answers

Before you settle on an in-memory structure, consider what happens when your server has to reboot. That in-memory structure is going to go away. If that is alright, then you should go with an in-memory structure. If it is not, you may want to consider simply using an separate domain object to handle this sort of metadata.

In-Memory

Apache processor threads do not share memory, so the best way to do this is probably to install something like memcached. Every time you want to get the current items, you make a call to a particular key ("topforty"). Memcached stays persistent and any of your threads can call it concurrently. This is a highly scalable solution.

In order for it work, however, you have to do additional work. Some program will need to assess the current likes and views, and update the topforty key. This could be done through your admin web application, or as a cron job on an hourly or daily basis. The service defined below could do it as well, simply doing it's work with memcached rather than with the object it's persisting.

Domain Object

If persistence is more key, and you're willing to hand off concurrence to your web application framework, then you want to create a service that handles this:

public interface PopularityService {
  public List<Item> getTopItems(int count);//gets n top items

  //lets the service know someone liked a thing
  public void registerLike(Item item, Person liker);

  //lets the service know someone viewed a 
  public void registerView(Item item, Person viewer);thing
}

This is going to require some backing object:

public class PopularStuff {
  public List<Item> popularItems
  ...
}

You should persist that object as a single object (or if your framework makes it easy, as a singleton). Your service should act on that object, deciding what should be in it and how to move things around. This will be a read-heavy solution, but not as read-heavy as other static data because presumably people will be doing a lot of views. If you're using something like Hibernate it will be very easy to jump from the list of items to the actual items in your database.

Note that I don't discuss the underlying algorithm, as you didn't ask about that but about how to implement the data structure. If you can provide details on your current framework we could discuss more specifics.

like image 95
Nathaniel Ford Avatar answered Sep 18 '22 13:09

Nathaniel Ford


Have you checked out Priority Queue? It seems like that takes care of your ordering needs, once you setup the right Comparator. If your list sizes are dynamic, then memory size might be an issue. But since you know how many items will be in each list, you can just specify that size as the initial capacity.

like image 42
Bill Bushey Avatar answered Sep 21 '22 13:09

Bill Bushey


I'm going to make a pretty big assumption, which is that when you say you "store the views count & no Of likes", you mean that they are stored in a query-friendly format (ie, an SQL Database or equivalent). The main reason you want to store information in-memory, then, is to cache the data, thereby reducing the number of database calls required to generate a page. Is this assumption correct?

If so, then I think you are over-complicating the issue. Rather than using complex data structures to maintain your information, treat it as a simple cache structure. Here's a highly simplified, pseudocode example of how it would work:

class TopXCache extends Runnable
{
  Object[] cachedData;

  int secondsToTimeOut;
  String sqlQueryToRefreshCache;

  boolean killSwitch = false;

  constructor(int itemsToKeepInCache, int secondsToTimeOut, String sqlQueryToRefreshCache)
  {
     this.secondsToTimeOut = secondsToTimeOut;
     this.sqlQueryToRefreshCache = sqlQueryToRefreshCache;

     this.cachedData = new Object[itemsToKeepInCache];
  }

  void run() // The method the thread will execute
  {
     while(!killSwitch) // Allows for "poison pill" shutdown
     {
       cachedData = executeQuery(sqlQueryToRefreshCache);
       wait(secondsToTimeOut);
     }
  }

  void kill()
  {
     killSwitch = true;
  }
}

To create a list, instantiate it with a poll time (secondsToTimeOut), an SQL query to run which will return the latest copy of the data (sqlQueryToRefresh), and the number of items you want in your list (itemsToKeepInCache, in your case, 40).

Then start up a thread which can execute the above (or a scheduled task, or a cron library task, whatever else you're using to manage timed events in your application,) and periodically your cache will refresh itself. If your system shuts down unexpectedly, then it will automatically rebuild itself from the database once the thread is restarted.

This is the basis of an incredibly simple cache. You can make it more complicated if you like, set it up as a singleton, add a "forceRefresh()" method to update the data outside the current refresh window, set it up to hold and refresh multiple caches on a single thread, or even go the whole hog and use a third party caching library.

Caching is the normal solution to this sort of problem though, and one which is normally easier to understand and maintain in the long term.

like image 29
Erica Avatar answered Sep 20 '22 13:09

Erica