Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra hard vs soft delete

Tags:

cassandra

I have multiple tables that I want to keep their deleted data.

I thought of two options to achieve that:

  1. Create new table called deleted_x and when deleting from x, immediatly insert to deleted_x.

    Advantage : querying from only one table.

    Disadvantages :

    • Do insert for each delete
    • When the original table structure changes, I will have to change the deleted table too.
  2. Have a column called is_deleted and put it in the partition key in each of these tables and set it to true when deleting a row.

    Advantage : One table structure

    Disadvantage : mention is_deleted in all queries from table

Are there any performence considerations I should think of additionally? Which way is the better way?

like image 978
user1394569 Avatar asked Oct 31 '22 09:10

user1394569


1 Answers

Option #1 is awkward, but it's probably the right way to do things in Cassandra. You could issue the two mutations (one DELETE, and one INSERT) in a single batch, and guarantee that both are written.

Option #2 isn't really as easy as you may expect if you're coming from a relational background, because adding an is_deleted column to a table in Cassandra and expecting to be able to query against it isn't trivial. The primary reason is that Cassandra performs significantly better when querying against the primary key (partition key(s) + optional clustering key(s) than secondary indexes. Therefore, for maximum performance, you'd need to model this as a clustering key - doing so then prohibits you from simply issuing an update - you'd need to delete + insert, anyway.

Option #2 becomes somewhat more viable in 3.0+ with Materialized Views - if you're looking at Cassandra 3.0+, it may be worth considering.

like image 50
Jeff Jirsa Avatar answered Nov 13 '22 03:11

Jeff Jirsa