Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What types of tombstones does Cassandra support?

What types of tombstones does Cassandra (version 2) support? According to this article it supports (in CQL terms):

  • a specific column for a row.
  • static columns.
  • all rows for a partition key.

Have I missed any other types of tombstones? Removing a specific (CQL) row? Are there any special tombstones to support removing ranges of cluster keys or similar? This information is useful to know when planning schemas to avoid too many tombstones.

like image 523
Ztyx Avatar asked Jan 10 '23 04:01

Ztyx


2 Answers

A tombstone is a marker placed into a row that indicates a deletion. They can exist in different places, in a column or a range of columns, or on a whole row. The example below shows the normal type of tombstone (the range type is not covered here).

When planning your schema you model your tables on the types of queries you are doing so rather than have one table you may find you have data duplicated across many tables. The tables are optimised to serve the incoming reads and writes. The link below should give you some good background on data modelling with Cassandra:

http://www.datastax.com/resources/data-modeling

My example: I created a table and inserted some data and then used nodetool flush to generate some sstables. Using the sstable2json tool you can see the deleted rows, if its a whole row it looks slightly different to the single column, but essentially its still just a marker:

Heres the table with all its data:

$ ~/dse-4.5.1/resources/cassandra/bin/sstable2json ./dse-data/results/ts1/results-ts1-jb-1-Data.db 
[
{"key": "3136","columns": [["","",1417814256390000], ["col2","26",1417814256390000], ["col3","36",1417814256390000], ["id","id16",1417814256390000]]},
{"key": "3133","columns": [["","",1417814218246000], ["col2","23",1417814218246000], ["col3","33",1417814218246000], ["id","id13",1417814218246000]]},
{"key": "3135","columns": [["","",1417814244766000], ["col2","25",1417814244766000], ["col3","35",1417814244766000], ["id","id15",1417814244766000]]},
{"key": "3134","columns": [["","",1417814230711000], ["col2","24",1417814230711000], ["col3","34",1417814230711000], ["id","id14",1417814230711000]]},
{"key": "3132","columns": [["","",1417814207910000], ["col2","22",1417814207910000], ["col3","32",1417814207910000], ["id","id12",1417814207910000]]},
{"key": "3131","columns": [["","",1417814197094000], ["col2","21",1417814197094000], ["col3","31",1417814197094000], ["id","id11",1417814197094000]]},
{"key": "31","columns": [["","",1417814185270000], ["col2","2",1417814185270000], ["col3","3",1417814185270000], ["id","id1",1417814185270000]]}
]

Heres the first deletion in cqlsh:

cqlsh:results> delete from ts1 WHERE col1 = '1';
cqlsh:results> delete id from ts1 WHERE col1 = '11';

Heres the resulting sstable after a flush:

[datastax@DSE3 ~]$ ~/dse-4.5.1/resources/cassandra/bin/sstable2json ./dse-data/results/ts1/results-ts1-jb-2-Data.db 
[
{"key": "3131","columns": [["id","54822130",1417814320400000,"d"]]},
{"key": "31","metadata": {"deletionInfo": {"markedForDeleteAt":1417814302304000,"localDeletionTime":1417814302}},"columns": []}
]

Heres the next delete in cqlsh:

cqlsh:results> delete col2 from ts1 WHERE col1 = '12';

Heres the resulting sstable after a flush:

[datastax@DSE3 ~]$ ~/dse-4.5.1/resources/cassandra/bin/sstable2json ./dse-data/results/ts1/results-ts1-jb-3-Data.db 
[
{"key": "3132","columns": [["col2","5482220b",1417814539434000,"d"]]}
]

When compaction occurs all of these sstables are combined into one single sstable and then deleted rows are all still there but marked with for deletion, we can see this again after running compaction (look for the d flags with the timestamp):

[datastax@DSE3 ~]$ ./dse-4.5.1/bin/nodetool compact
[datastax@DSE3 ~]$ ~/dse-4.5.1/resources/cassandra/bin/sstable2json ./dse-data/results/ts1/results-ts1-jb-4-Data.db 
[
{"key": "3136","columns": [["","",1417814256390000], ["col2","26",1417814256390000], ["col3","36",1417814256390000], ["id","id16",1417814256390000]]},
{"key": "3133","columns": [["","",1417814218246000], ["col2","23",1417814218246000], ["col3","33",1417814218246000], ["id","id13",1417814218246000]]},
{"key": "3135","columns": [["","",1417814244766000], ["col2","25",1417814244766000], ["col3","35",1417814244766000], ["id","id15",1417814244766000]]},
{"key": "3134","columns": [["","",1417814230711000], ["col2","24",1417814230711000], ["col3","34",1417814230711000], ["id","id14",1417814230711000]]},
{"key": "3132","columns": [["","",1417814207910000], ["col2","5482220b",1417814539434000,"d"], ["col3","32",1417814207910000], ["id","id12",1417814207910000]]},
{"key": "3131","columns": [["","",1417814197094000], ["col2","21",1417814197094000], ["col3","31",1417814197094000], ["id","54822130",1417814320400000,"d"]]},
{"key": "31","metadata": {"deletionInfo": {"markedForDeleteAt":1417814302304000,"localDeletionTime":1417814302}},"columns": []}
]

Now this table will remain like this until we reach our gc_grace_seconds and then on the next compaction the rows will actually disappear, watch as we drop the gc_grace_seconds and then run compaction:

cqlsh> ALTER TABLE results.ts1 WITH gc_grace_seconds=500;
cqlsh> exit
[datastax@DSE3 ~]$ ./dse-4.5.1/bin/nodetool compact results;

[datastax@DSE3 ~]$ ./dse-4.5.1/resources/cassandra/bin/sstable2json ./dse-data/results/ts1/results-ts1-jb-5-Data.db 
[
{"key": "3136","columns": [["","",1417814256390000], ["col2","26",1417814256390000], ["col3","36",1417814256390000], ["id","id16",1417814256390000]]},
{"key": "3133","columns": [["","",1417814218246000], ["col2","23",1417814218246000], ["col3","33",1417814218246000], ["id","id13",1417814218246000]]},
{"key": "3135","columns": [["","",1417814244766000], ["col2","25",1417814244766000], ["col3","35",1417814244766000], ["id","id15",1417814244766000]]},
{"key": "3134","columns": [["","",1417814230711000], ["col2","24",1417814230711000], ["col3","34",1417814230711000], ["id","id14",1417814230711000]]},
{"key": "3132","columns": [["","",1417814207910000], ["col3","32",1417814207910000], ["id","id12",1417814207910000]]},
{"key": "3131","columns": [["","",1417814197094000], ["col2","21",1417814197094000], ["col3","31",1417814197094000]]}
]

Notice how the row for key 31 has gone and also col1 in the row with key 3132 and id in the row with key 3131

My table schema for clarity:

cqlsh:results> DESCRIBE TABLE ts1 ;

CREATE TABLE ts1 (
  col1 text,
  col2 text,
  col3 text,
  id text,
  PRIMARY KEY ((col1))
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

As a footnote, the tombstone markers in the sstable2json output are as follows:

e - expired TTL

d - deleted value (tombstone)

t - deleted range of values (range tombstone)

like image 83
markc Avatar answered Jan 23 '23 06:01

markc


Adding to @markc's answer, there's also a column-range tombstone that shows up whenever you use collections. We have a set<text> column called "tags", and whenever we insert a row we get one of these (even if we're just setting it to null as in this case):

["1381316637599609:45787829:tags:_","1381316637599609:45787829:tags:!",1438264650252000,"t",1438264650],

We think the "t" stands for tombstone. This blog post details another example of this kind of tombstone.

like image 34
8forty Avatar answered Jan 23 '23 06:01

8forty