Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table with heavy writes and some reads in Cassandra. Primary key searches taking 30 seconds. (Queue)

Have a table set up in Cassandra that is set up like this:

  • Primary key columns
    • shard - an integer between 1 and 1000
    • last_used - a timestamp
  • Value columns:
    • value - a 22 character string

Example if how this table is used:

shard     last_used          | value
------------------------------------
457       5/16/2012 4:56pm     NBJO3poisdjdsa4djmka8k    >--     Remove from front...
600       6/17/2013 5:58pm     dndiapas09eidjs9dkakah       | 
...(1 million more rows)                                    |
457       NOW                  NBJO3poisdjdsa4djmka8k    <--     ..and put in back

The table is used as a giant queue. Very many threads are trying to "pop" the row off with the lowest last_used value, then update the last_used value to the current moment in time. This means that once a row is read, since last_used is part of the primary key, that row is deleted, then a new row with the same shard, value, and updated last_used time is added to the table, at the "end of the queue".

The shard is there because so many processes are trying to pop the oldest row off the front of the queue and put it at the back, that they would severely bottleneck each other if only one could access the queue at the same time. The rows are randomly separated into 1000 different "shards". Each time a thread "pops" a row off the beginning of the queue, it selects a shard that no other thread is currently using (using redis).

Holy crap, we must be dumb!

The problem we are having is that this operation has become very slow on the order of about 30 seconds, a virtual eternity.

We have only been using Cassandra for less than a month, so we are not sure what we are doing wrong here. We have gotten some indication that perhaps we should not be writing and reading so much to and from the same table. Is it the case that we should not be doing this in Cassandra? Or is there perhaps some nuance in the way we are doing it or the way that we have it configured that we need to change and/or adjust? How might be trouble-shoot this?

More Info

  • We are using the MurMur3Partitioner (the new random partitioner)
  • The cluster is currently running on 9 servers with 2GB RAM each.
  • The replication factor is 3

Thanks so much!

like image 624
Chris Dutrow Avatar asked Dec 26 '22 01:12

Chris Dutrow


1 Answers

This is something you should not use Cassandra for. The reason you're having performance issues is because Cassandra has to scan through mountains of tombstones to find the remaining live columns. Every time you delete something Cassandra writes a tombstone, it's a marker that the column has been deleted. Nothing is actually deleted from disk until there is a compaction. When compacting Cassandra looks at the tombstones and determines which columns are dead and which are still live, the dead ones are thrown away (but then there is also GC grace, which means that in order to avoid spurious resurrections of columns Cassandra keeps the tombstones around for a while longer).

Since you're constantly adding and removing columns there will be enormous amounts of tombstones, and they will be spread across many SSTables. This means that there is a lot of overhead work Cassandra has to do to piece together a row.

Read the blog post "Cassandra anti-patterns: queues and queue-like datasets" for some more details. It also shows you how to trace the queries to verify the issue yourself.

It's not entirely clear from your description what a better solution would be, but it very much sounds like a message queue like RabbitMQ, or possibly Kafka would be a much better solution. They are made to have a constant churn and FIFO semantics, Cassandra is not.

There is a way to make the queries a bit less heavy for Cassandra, which you can try (although I still would say Cassandra is the wrong tool for this job): if you can include a timestamp in the query you should hit mostly live columns. E.g. add last_used > ? (where ? is a timestamp) to the query. This requires you to have a rough idea of the first timestamp (and don't do a query to find it out, that would be just as costly), so it might not work for you, but it would take some of the load off of Cassandra.

like image 86
Theo Avatar answered May 14 '23 05:05

Theo