I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT. Is there any harm in doing so? EDIT!! There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304

The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there. This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth. Indeed, it is pretty easy to populate a row with null values: <pre class="prettyprint"><code>INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null); </code></pre> This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with: <pre class="prettyprint"><code>INSERT INTO mytable (pk) VALUES (0x1234); </code></pre> Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better. Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra. In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.

Inserting null values into cassandra

Tags:

cassandra

cql

tombstone

I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT.

Is there any harm in doing so?

EDIT!!

There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304

316

asked Nov 21 '16 06:11

ArchitGarg

2 Answers

Inserting a null value creates a tombstone.
You should not create tombstone :
1. Tombstone take up space and can substantially increase the amount of storage you require.
2. Querying tables with a large number of tombstones causes performance problems and it causes Latency and heap pressure.

165

answered Oct 21 '22 19:10

Ashraful Islam

The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there.

This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth.

Indeed, it is pretty easy to populate a row with null values:

INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null);

This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with:

INSERT INTO mytable (pk) VALUES (0x1234);

Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better.

Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra.

In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.

answered Oct 21 '22 18:10

xmas79

Related questions
                            
                                What is the best way to distribute postgresql
                            
                                Looking for a basic and up-to-date Cassandra tutorial [closed]
                            
                                Inserting arbitrary columns in Cassandra using CQL3
                            
                                Master-less model in Cassandra vs master-slave model in MongoDB?
                            
                                Hadoop on cassandra database
                            
                                Is there a stable Cassandra library for Erlang?
                            
                                How to remove dead node out of the Cassandra cluster?
                            
                                Cassandra Non-Counter Family
                            
                                How to make a post request with the Python requests library?
                            
                                Which database to choose (Cassandra, MongoDB, ?) for storing and querying event / log / metrics data?
                            
                                Select 2000 most recent log entries in cassandra table using CQL (Latest version)
                            
                                NoNodeAvailableException: No node was available to execute the query
                            
                                cassandra cql delete using a less than operator on a secondary key
                            
                                Cassandra Compaction vs Repair vs Cleanup
                            
                                How to add columns dynamically in a column family in cassandra using cql
                            
                                How does the Leveled Compaction Strategy ensure 90% of reads are from one sstable
                            
                                Timestamp comparison in cassandra
                            
                                How to insert a datetime into a Cassandra 1.2 timestamp column
                            
                                Just set the TTL on a row
                            
                                Calculating the size of a table in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With