Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the quickest/easiest way to count active users in last one minute?

You work for Zynga, and want to count the number of currently active players for different games. Your web server handles pings from many different games and each user has a unique GUID. Must be able to query number of active users for one game at a time. Active users are those that gotten ping in last minute.

The log lines come in continuously into a web server:

10.1.12.13 - - "http://zynga.com/ping?guid=<guid>&game=<gameID>" -

What is the quickest/easiest way to count active users? Please suggest a 45 minutes answer with some code.


My version

// web server interface, every time ping comes in count() will be called
// void count(String gameId, String guid)
// int getNumberActivePlayers(String gameId)

struct Record{
  String gameID;
  String guid;
};

class PingStorage{
private:
  max_heap<long, Record> storage;
public:
  //    O(log(n))
  //  n = total number of elements in storage
  void count(String gameId, String guid){
    long currentTimeStamp = getUnixTimeStamp();
    Record rec ;
    rec.gameId = gameId;
    rec.guid = guid;
    storage.add(currentTimeStamp, rec);
  }
  //N = numner of records in last ,minutes in storage
  //O(N)
  int getNumberActivePlayers(String gameId){
    map<String, Set<string> > game2user;
    long tillTimeStamp = getUnixTimeStampNow() - 60;
    while(true){
      pair<long, Record> rec = storage.getMax(); //O(1)
      if(rec.first <= tillTimeStamp) break;  
      Set<String> temp = game2user[rec.gameid]; //O(1)
      temp.add(rec.userid); //O(log(N)) - O(1)
    }
    return game2user[gameID].size();
  }
};
like image 376
Master Yoda Avatar asked Jun 14 '12 18:06

Master Yoda


1 Answers

Assuming this is a realtime solution, you can handle ping requests in O(1), generate the current player statistic in O(1), and using O(num_player) space by sacrificing some accuracy. The key is to discretize timings.

Overview

The basic idea is to represent discrete time intervals as objects and store in those objects the following property: the number of distinct players that pinged during this time interval that have not since pinged. To query the number of active users, calculate a weighted sum of the last x time intervals that comprise the last minute.

Details

First, select an acceptable temporal resolution. In this example, I choose 15-second intervals.

Maintain five PingInterval data structures to represent five of these intervals (spanning 1 more intervals than 1 minute). PingInterval contains one property: a counter. These PingIntervals are maintained in a PingMonitor. Each time a player pings, update a map within PingMonitor that maps each player to the current time interval. When you perform this mapping, take the following steps, which maintain the counts within the PingIntervals (according to the characteristics I described in the overview section).

  • If the player is already mapped to an interval and it is the current interval, do nothing.
  • Else, if the player is mapped to an interval that isn't the current interval,
    • decrement the old interval's count,
    • increment the current interval's count,
    • and map that player to that interval.
  • Else, if the player is not mapped to an interval at all,
    • increment the current interval's count,
    • map the player to the current interval.

(If the PingInterval to represent the current time doesn't exist yet, set the oldest PingInterval to null, create a new PingInterval in a thread-safe way, and then continue as normal.)

When you want to query the number of active users, calculate a time-passed-weighted sum of the last five intervals time intervals. For example, if you are only 5 seconds into the current time interval (meaning the next 10 seconds of that interval have not happened yet), calculate this value: 2/3 * oldest interval + sum of the 4 newest intervals.

Other thoughts

Five intervals is very conservative; we could scale up the number considerably for greater accuracy (maybe one per second) and it would still afford us a significant savings. The important part is that our times are now discrete intervals. This means that when we go to count the number of active users, we don't have to look at every individual time (which is equal to the number of users); instead we can look at x bins of time that we've predefined.

like image 85
cheeken Avatar answered Oct 26 '22 00:10

cheeken