Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique visitors with Redis or Aerospike

I am trying to count the unique vistors per page or other events (like click etc ) etc for different clients. What I plan to do is assign a unique cookie based GUID and then for every event call SADD for the GUID. redis key will be SET_[ EVENTID ]

If I just wanted count of users I could probably use PFADD, but my app also needs to know who are the unique users

But problem is if there are too many EVENTs or too many users then SADD will end up with a lot of user ids in memory We are expecting 1000k+ user events every hour , across all clients and the number of events will also be 100+

I want an opinion is redis the correct storage choice. Any traditional RDBMS method does not work because of the sheer number of requests

I am not sure if any other storage can help like Aerospike

like image 671
Ram Avatar asked Jan 12 '17 14:01

Ram


Video Answer


1 Answers

In RTB, where Aerospike is used heavily, frequency capping is a common use case for the Demand-Side Platforms (DSP). A cap is placed on the number of times a user sees a particular ad, or ads from a specific campaign. At the same time, the total number of impressions is tracked, along with the remaining budget. These counters typically have a short TTL.

Solution

You could use a composite key <page ID : user ID : yyyymmdd> as a flag for whether a specific user had visited the page, with a 24h TTL. This would live in a set page-visit in an in-memory, data-in-index namespace.

If there is no such key:

  • Create a new record with this key in the set page-visit with an initial value of 1.
  • list-append the user ID to a key <page ID : yyyymmdd> in the set page-users. This set (page-users) can live in a namespace that stores its data on SSD.

If this key exists:

  • Increment the count of the record at this key. This will provide instantaneous unique visitor counts for each page.

At the end of the day:

  • Get the count for each page, as well as the list of unique users that visited that page.
  • Read the record with key <page ID : yyyymmdd> from the set page-users
  • Assemble a batch-read against the users set based on this list of user IDs.

Advantages

  • Checking the page-visit flag is very low latency. It uses very little memory, as data-in-index namespaces take no additional space past the 64B of metadata each object in Aerospike costs. For example, 10M users * 64B * replication-factor 2 = 1.2GB of DRAM.
  • The list of unique users per-page is stored on SSD with a much lower cost per-GB than an in-memory only database like Redis. You just pay 64B per-object for the metadata entry in the in-memory primary index. The list-append operation is very efficient, as you only send the latest user ID to be appended to the page-users record. You only use this operation when a new unique user appears on the page (guarded by the page-visit flag).
  • All these records have their 24h TTL, so you can let them expire.
  • Aerospike is a distributed key-value database that scales vertically to use all the cores on your server, and horizontally without your application requiring sharding as new nodes join. The data distribution is handled automatically by the server and tracked by the client without your application needing to change.
like image 68
Ronen Botzer Avatar answered Sep 21 '22 10:09

Ronen Botzer