Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra lookup query is quite slow after deleting large bundle of data

Currently, I have a cassandra column family with large rows of data, to say more than 100,000. Now, I'd like to remove all data in this column family and the problem came up:

After all data is removed, I execute a lookup query in this column family, the cassandra will take tens of seconds to return a empty query result. And the time cost will increase Linearly when the original data is larger

It is caused by the tombstone feature while deleting data from the cassandra database. The lookup speed won't recover to normal until the next GC is fired. See Cassandra Distributed Deletes.

Because such query operations are frequently used in my system, I cannot bear the huge latency up to a few seconds.

Would you please give me a solution to this problem?

like image 224
Fify Avatar asked Oct 22 '22 00:10

Fify


1 Answers

This sounds like a very bad way to use a database. Populate it, empty it, repeat. One way you can solve your problem is by using different CF names each time, as in when you empty the data and start repopulating it, create a new column family and use that and just drop the other colum family however this is hacky.

I'd suggest using compaction (gets rid of all the tombstones it can detect) to solve your problem, it is CPU intensive but it's better than waiting for tens of seconds for queries to respond. You can make the task less intensive on your machine by providing the specific ks & cf you want to compact:

./nodetool compact <ks_name> <cf_name>

Ritchard's point is a good one, gc_grace_seconds is set to 10 days by default so you will probably have to tweak this to allow for compaction to get rid of tombstones.

like image 154
Lyuben Todorov Avatar answered Oct 25 '22 19:10

Lyuben Todorov