Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overwrite row in cassandra with INSERT, will it cause tombstone?

Writing data to Cassandra without causing it to create tombstones are vital in our case, due to the amount of data and speed. Currently we have only written a row once, and then never had the need to update the row again, only fetch the data again.

Now there has been a case, where we actually need to write data, and then complete it with more data, that is finished after awhile. It can be made by either;

  1. overwrite all of the data in a row again using INSERT (all data is available), or

  2. performing an Update only on the new data.

What is the best way to do it, bear in mind of the speed and not creating a tombstone is of importance ?

like image 947
Andreas Mattisson Avatar asked Jun 25 '15 14:06

Andreas Mattisson


People also ask

What causes tombstones in Cassandra?

In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as a tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads, and will eventually allow the data to be dropped via compaction.

Does truncate create tombstones in Cassandra?

truncate does not write tombstones at all (instead it will delete all on all nodes for your truncated table sstables immediately)

What are tombstone cells?

Cell tombstones are generated when explicitly deleting a value from a cell, such as a column for a specific row of a partition, or when inserting or updating a cell with null values, as shown in the following example.


2 Answers

Tombstones will only created when deleting data or using TTL values.

Cassandra does align very well to your described use case. Incrementally adding data will work for both INSERT and UPDATE statements. Cassandra will store data in different locations in case of adding data over time for the same partition key. Periodically running compactions will merge data again for a single key to optimize access and free disk space. This will happend based on the timestamp of written values but does not create any new tombstones. You can learn more about how Cassandra stores data e.g. here.

like image 187
Stefan Podkowinski Avatar answered Oct 20 '22 08:10

Stefan Podkowinski


It would be more efficient to do an update to add new or changed data. There is no need to rewrite the old data that isn't changing and it would be inefficient to make Cassandra rewrite it.

When you do an insert or update, Cassandra keeps a timestamp for the modify time for each column. When you do a read, Cassandra collects all the writes for that key from in memory, from on disk, and from other replicas depending on the consistency setting. It will then merge the column data so that the newest value is used for each column.

When data is compacted on disk, if there are separate updates for different columns of a row, those will be combined into a single row in the compacted data.

You don't need to worry about creating tombstones by doing an update unless you are using an update to set a TTL (Time To Live) value. In your application it sounds like you never delete data, so you will never have any tombstones.

like image 32
Jim Meyer Avatar answered Oct 20 '22 08:10

Jim Meyer