I have a Cassandra table with TTL of 60 seconds, I have few questions in this, 1) I am getting the following warning <pre class="prettyprint"><code>Read 76 live rows and 1324 tombstone cells for query SELECT * FROM xx.yy WHERE token(y) >= token(fc872571-1253-45a1-ada3-d6f5a96668e8) LIMIT 100 (see tombstone_warn_threshold) </code></pre> What does this mean? 2) As per my study, Tombstone is a flag in case of TTL (will be deleted after gc_grace_seconds) i) so till 10 days does it mean that it won't be deleted ? ii) What will be the consequence of it waiting for 10 days? iii) Why it is a long time 10 days? https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html gc_grace_seconds 864000 [10 days] The number of seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection. Cassandra will not execute hints or batched mutations on a tombstoned record within its gc_grace_period. The default value allows a great deal of time for Cassandra to maximize consistency prior to deletion. For details about decreasing this value, see garbage collection below. 3) I read that performing compaction and repair using nodetool will delete the tombstone, How frequently we need to run this in background, What will be the consequence of it?

<ol> <li>This means that your query returned 76 "live" or non-deleted/non-obsoleted rows of data, and that it had to sift through 1324 tombstones (deletion markers) to accomplish that.</li> <li> In the world of distributed databases, deletes are hard. After all, if you delete a piece of data from one node, and you expect that deletion to happen on all of your nodes, how would you know if it worked? Quite literally, how do you replicate nothing? Tombstones (delete markers) are the answer to that question. i. The data is gone (obsoleted, rather). The tombstone(s) will remain for <code>gc_grace_seconds</code>. ii. The "consequence" is that you'll have to put up with those tombstone warning messages for that duration, or find a way to run your query without having to scan over the tombstones. iii. The idea behind the 10 days, is that if the tombstones are collected too early, that your deleted data will "ghost" its way back up to some nodes. 10 days gives you enough time to run a weekly repair, which ensures your tombstones are properly replicated before removal. </li> <li>Compaction removes tombstones. Repair replicates them. You should run repair once per week. While you can run compaction on-demand, don't. Cassandra has its own thresholds (based on number and size of SSTable files) to figure out when to run compaction, and it's best not to get in its way. If you do, you'll be manually running compaction from there on out, as you'll probably never reach the compaction conditions organically.</li> </ol> The consequences, are that both repair and compaction take compute resources, and can reduce a node's ability to serve requests. But they need to happen. You want them to happen. If compaction doesn't run, your SSTable files will grow in number and size; eventually causing rows to exist over multiple files, and queries for them will get slow. If repair doesn't run, your data is at risk of not being in-sync.

Tombstone in Cassandra

Tags:

cassandra

cassandra-3.0

I have a Cassandra table with TTL of 60 seconds, I have few questions in this,

1) I am getting the following warning

Click to copy

Read 76 live rows and 1324 tombstone cells for query SELECT * FROM xx.yy WHERE token(y) >= token(fc872571-1253-45a1-ada3-d6f5a96668e8) LIMIT 100 (see tombstone_warn_threshold)

What does this mean?

2) As per my study, Tombstone is a flag in case of TTL (will be deleted after gc_grace_seconds) i) so till 10 days does it mean that it won't be deleted ? ii) What will be the consequence of it waiting for 10 days? iii) Why it is a long time 10 days?

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html

gc_grace_seconds 864000 [10 days] The number of seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection. Cassandra will not execute hints or batched mutations on a tombstoned record within its gc_grace_period. The default value allows a great deal of time for Cassandra to maximize consistency prior to deletion. For details about decreasing this value, see garbage collection below.

3) I read that performing compaction and repair using nodetool will delete the tombstone, How frequently we need to run this in background, What will be the consequence of it?

209

asked Mar 27 '18 17:03

Harry

1 Answers

This means that your query returned 76 "live" or non-deleted/non-obsoleted rows of data, and that it had to sift through 1324 tombstones (deletion markers) to accomplish that.
In the world of distributed databases, deletes are hard. After all, if you delete a piece of data from one node, and you expect that deletion to happen on all of your nodes, how would you know if it worked? Quite literally, how do you replicate nothing? Tombstones (delete markers) are the answer to that question.

i. The data is gone (obsoleted, rather). The tombstone(s) will remain for gc_grace_seconds.

ii. The "consequence" is that you'll have to put up with those tombstone warning messages for that duration, or find a way to run your query without having to scan over the tombstones.

iii. The idea behind the 10 days, is that if the tombstones are collected too early, that your deleted data will "ghost" its way back up to some nodes. 10 days gives you enough time to run a weekly repair, which ensures your tombstones are properly replicated before removal.
Compaction removes tombstones. Repair replicates them. You should run repair once per week. While you can run compaction on-demand, don't. Cassandra has its own thresholds (based on number and size of SSTable files) to figure out when to run compaction, and it's best not to get in its way. If you do, you'll be manually running compaction from there on out, as you'll probably never reach the compaction conditions organically.

The consequences, are that both repair and compaction take compute resources, and can reduce a node's ability to serve requests. But they need to happen. You want them to happen. If compaction doesn't run, your SSTable files will grow in number and size; eventually causing rows to exist over multiple files, and queries for them will get slow. If repair doesn't run, your data is at risk of not being in-sync.

164

answered Nov 25 '22 09:11

Aaron

Related questions
                            
                                Compilation errors with spark cassandra connector and SBT
                            
                                Cassandra eats up all the disk space
                            
                                Delete query in cassandra
                            
                                delete multiple elements from a MAP in cassandra?
                            
                                How to delete a record in Cassandra?
                            
                                Inserting multiple types in map in cassandra
                            
                                Cassandra alter column type: which types are compatible?
                            
                                python ORM for apache cassandra
                            
                                How do I use inheritance when using the Datastax entity mapper for Cassandra?
                            
                                Unable to start Cassandra DSC on Mac - Error starting local jmx server
                            
                                No appropriate python interpreter found. Cassandra
                            
                                PicklingError when copying a very large cassandra table using cqlsh
                            
                                Cassandra start error with ThreadPriorityPolicy=42
                            
                                Which compaction strategy to use for both read/write intensive program using scylla db
                            
                                Knowledge sources for Apache Cassandra
                            
                                Storing a list of values in Cassandra
                            
                                Cassandra in-memory configuration
                            
                                Enabling CQL Binary Protocol throws YAMLException: Unable to find property - in DataStax 3.0 installation in ubuntu
                            
                                why HBase count operation so slow
                            
                                Cassandra query with equals operator on timestamp column not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With