I have multiple tables that I want to keep their deleted data.
I thought of two options to achieve that:
Create new table called deleted_x and when deleting from x, immediatly insert to deleted_x.
Advantage : querying from only one table.
Disadvantages :
Have a column called is_deleted
and put it in the partition key in each of these tables and set it to true when deleting a row.
Advantage : One table structure
Disadvantage : mention is_deleted
in all queries from table
Are there any performence considerations I should think of additionally? Which way is the better way?
Option #1 is awkward, but it's probably the right way to do things in Cassandra. You could issue the two mutations (one DELETE, and one INSERT) in a single batch, and guarantee that both are written.
Option #2 isn't really as easy as you may expect if you're coming from a relational background, because adding an is_deleted
column to a table in Cassandra and expecting to be able to query against it isn't trivial. The primary reason is that Cassandra performs significantly better when querying against the primary key (partition key(s) + optional clustering key(s) than secondary indexes. Therefore, for maximum performance, you'd need to model this as a clustering key - doing so then prohibits you from simply issuing an update - you'd need to delete + insert, anyway.
Option #2 becomes somewhat more viable in 3.0+ with Materialized Views - if you're looking at Cassandra 3.0+, it may be worth considering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With