You work for Zynga, and want to count the number of currently active players for different games. Your web server handles pings from many different games and each user has a unique GUID. Must be able to query number of active users for one game at a time. Active users are those that gotten ping in last minute.
The log lines come in continuously into a web server:
10.1.12.13 - - "http://zynga.com/ping?guid=<guid>&game=<gameID>" -
What is the quickest/easiest way to count active users? Please suggest a 45 minutes answer with some code.
My version
// web server interface, every time ping comes in count() will be called
// void count(String gameId, String guid)
// int getNumberActivePlayers(String gameId)
struct Record{
String gameID;
String guid;
};
class PingStorage{
private:
max_heap<long, Record> storage;
public:
// O(log(n))
// n = total number of elements in storage
void count(String gameId, String guid){
long currentTimeStamp = getUnixTimeStamp();
Record rec ;
rec.gameId = gameId;
rec.guid = guid;
storage.add(currentTimeStamp, rec);
}
//N = numner of records in last ,minutes in storage
//O(N)
int getNumberActivePlayers(String gameId){
map<String, Set<string> > game2user;
long tillTimeStamp = getUnixTimeStampNow() - 60;
while(true){
pair<long, Record> rec = storage.getMax(); //O(1)
if(rec.first <= tillTimeStamp) break;
Set<String> temp = game2user[rec.gameid]; //O(1)
temp.add(rec.userid); //O(log(N)) - O(1)
}
return game2user[gameID].size();
}
};
Assuming this is a realtime solution, you can handle ping requests in O(1), generate the current player statistic in O(1), and using O(num_player) space by sacrificing some accuracy. The key is to discretize timings.
Overview
The basic idea is to represent discrete time intervals as objects and store in those objects the following property: the number of distinct players that pinged during this time interval that have not since pinged. To query the number of active users, calculate a weighted sum of the last x time intervals that comprise the last minute.
Details
First, select an acceptable temporal resolution. In this example, I choose 15-second intervals.
Maintain five PingInterval data structures to represent five of these intervals (spanning 1 more intervals than 1 minute). PingInterval contains one property: a counter. These PingIntervals are maintained in a PingMonitor. Each time a player pings, update a map within PingMonitor that maps each player to the current time interval. When you perform this mapping, take the following steps, which maintain the counts within the PingIntervals (according to the characteristics I described in the overview section).
(If the PingInterval to represent the current time doesn't exist yet, set the oldest PingInterval to null, create a new PingInterval in a thread-safe way, and then continue as normal.)
When you want to query the number of active users, calculate a time-passed-weighted sum of the last five intervals time intervals. For example, if you are only 5 seconds into the current time interval (meaning the next 10 seconds of that interval have not happened yet), calculate this value: 2/3 * oldest interval + sum of the 4 newest intervals.
Other thoughts
Five intervals is very conservative; we could scale up the number considerably for greater accuracy (maybe one per second) and it would still afford us a significant savings. The important part is that our times are now discrete intervals. This means that when we go to count the number of active users, we don't have to look at every individual time (which is equal to the number of users); instead we can look at x bins of time that we've predefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With