Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to increment a counter in Cassandra?

Tags:

cassandra

I'd like to use Cassandra to store a counter. For example how many times a given page has been viewed. The counter will never decrement. The value of the counter does not need to be exact but it should be accurate over time.

My first thought was to store the value as a column and just read the current count, increment it by one and then put it back in. However if another operation is also trying to increment the counter, I think the final value would just be the one with the latest timestamp.

Another thought would be to store each page load as a new column in a CF. Then I could just run get_count() on that key and get the number of columns. Reading through the documentation, it appears that it is not a very efficient operation at all.

Am I approaching the problem incorrectly?

like image 972
Stephen Holiday Avatar asked Aug 24 '10 16:08

Stephen Holiday


4 Answers

Counters have been added to Cassandra 0.8

Use the incr method increment the value of a column by 1.

[default@app] incr counterCF [ascii('a')][ascii('x')];
Value incremented.
[default@app] incr counterCF [ascii('a')][ascii('x')];
Value incremented.

Describe here: http://www.jointhegrid.com/highperfcassandra/?p=79

Or it can be done programatically

CounterColumn counter = new CounterColumn();
ColumnParent cp = new ColumnParent("page_counts_by_minute");
counter.setName(ByteBufferUtil.bytes(bucketByMinute.format(r.date)));
counter.setValue(1);
c.add(ByteBufferUtil.bytes( bucketByDay.format(r.date)+"-"+r.url)
            , cp, counter, ConsistencyLevel.ONE);

Described here: http://www.jointhegrid.com/highperfcassandra/?cat=7

like image 154
Edward Capriolo Avatar answered Oct 20 '22 14:10

Edward Capriolo


[Update] Looks like counter support will be ready for primetime in 0.8!

I definitely wouldn't use get_count, as that is an O(n) operation which is ran every time you read the "counter." Worse than it being just O(n) it may span multiple nodes which would introduce network latency. And finally, why tie up all that disk space when all you care about is a single number?

For right now, I wouldn't use Cassandra for counters at all. They are working on this functionality, but it's not ready for prime time yet.

https://issues.apache.org/jira/browse/CASSANDRA-1072

You've got a few options in the mean time.

1) (Bad) Store your count in a single record and have one and only one thread of your application be responsible for counter management.

2) (Better) Split the counter into n shards, and have n threads manage each shard as a separate counter. You can randomize which thread is used by your app each time for stateless load balancing across these threads. Just make sure that each thread is responsible for exactly one shard.

3a) (Best) Use a separate tool that is either transactional (aka an RDBMS) or that supports atomic increment operations (memcached, redis).

[Update.2] I would avoid using a distributed lock (see memcached and zookeeper mutexes), as this is very intolerant to node failure or network partitioning if improperly implemented.

like image 41
Ben Burns Avatar answered Oct 20 '22 14:10

Ben Burns


What I ended up doing was using get_count() and caching the result in a caching ColumnFamily.

This way I could get a general guess at the count but still get the exact count whenever I wanted.

Additionally, I was able to adjust how stale the data I was willing to accept on a per request basis.

like image 39
Stephen Holiday Avatar answered Oct 20 '22 15:10

Stephen Holiday


We are going to address a similar problem by keeping the current value of a counter in a distributed cache (for example - memcached). When the counter is updated, we will store its value in Cassandra. Therefore even if some cache node fails, we will be able to get the value from the database.

This solution is not perfect. However data such a visit counter are not very sensitive so minor inconsistencies are allowed in my opinion.

like image 44
Jacek L. Avatar answered Oct 20 '22 15:10

Jacek L.