Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance - TTL vs Deleting a row in Cassandra

Tags:

cassandra

We have a massive set of data that is written in to millions of rows in cassandra. We also have a scheduler that needs to process these records and remove them after processing them successfully.

Was wondering if Deleting the row after processing vs Marking a row with a TTL (essentially delaying its deletion).

Are there any pros / cons with Deletion vs TTL w.r.t Cassandra performance ?.

Thanks much _DD

like image 631
Durga Deep Avatar asked Oct 23 '16 16:10

Durga Deep


2 Answers

When using TTL the record is not removed from storage immediately, it is marked as tombstone. It gets physically removed only when the compaction occurs. Till that time the data impacts the nodes processing as it consumes the resources till the compaction happens. When you do a range query event the deleted(marked as tombstone) records are scanned by Cassandra. So using TTL to delete too many entries is considered as anti-pattern. The recommendation is to use temporary tables so that individual rows need not be removed. Just drop the entire table.

like image 142
Nawaz Avatar answered Oct 16 '22 03:10

Nawaz


From what little information you have given here it sounds to me that you are using Cassandra as a queue which is a well known anti-pattern. You can read more about that here:

http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

However to answer your basic question there is little difference in performance between using TTL and deletes. TTL's in C* are handled as tombstones which is the same as a delete. The major difference is that a tombstone is not written to a record who's TTL has expired until that record is read again. When a delete is called a tombstone is immediately created. Tombstones in general cause significant performance problems within C* and while there are some methods to mitigate the issues that they create having large numbers of them usually point to a poor data model or poor use case for C*. If you are really looking at using C* as a queue why not look at using something more fit for that purpose such as Redis?

like image 28
bechbd Avatar answered Oct 16 '22 02:10

bechbd