Cassandra hard vs soft delete

Question

I have multiple tables that I want to keep their deleted data.

I thought of two options to achieve that:

Create new table called deleted_x and when deleting from x, immediatly insert to deleted_x.

Advantage : querying from only one table.

Disadvantages :
- Do insert for each delete
- When the original table structure changes, I will have to change the deleted table too.
Have a column called is_deleted and put it in the partition key in each of these tables and set it to true when deleting a row.

Advantage : One table structure

Disadvantage : mention is_deleted in all queries from table

Are there any performence considerations I should think of additionally? Which way is the better way?

Jeff Jirsa · Accepted Answer

Option #1 is awkward, but it's probably the right way to do things in Cassandra. You could issue the two mutations (one DELETE, and one INSERT) in a single batch, and guarantee that both are written.

Option #2 isn't really as easy as you may expect if you're coming from a relational background, because adding an is_deleted column to a table in Cassandra and expecting to be able to query against it isn't trivial. The primary reason is that Cassandra performs significantly better when querying against the primary key (partition key(s) + optional clustering key(s) than secondary indexes. Therefore, for maximum performance, you'd need to model this as a clustering key - doing so then prohibits you from simply issuing an update - you'd need to delete + insert, anyway.

Option #2 becomes somewhat more viable in 3.0+ with Materialized Views - if you're looking at Cassandra 3.0+, it may be worth considering.

Cassandra hard vs soft delete

Tags:

cassandra

user1394569

1 Answers

Jeff Jirsa

Recent Activity

Donate For Us

Cassandra hard vs soft delete

Tags:

cassandra

user1394569

1 Answers

Jeff Jirsa

Related questions

Recent Activity

Donate For Us