I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT.
Is there any harm in doing so?
EDIT!!
There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304
Indeed, it is pretty easy to populate a row with null values: INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null); Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood.
update keyspace. table set ric=null where code='code'; You can also use null in insert statements, but if you omit the value it's the same as saying null so there's no point.
How do I query in cassandra for != null columns. Select * from tableA where id != null; Select * from tableA where name !=
In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as a tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads, and will eventually allow the data to be dropped via compaction.
Inserting a null value creates a tombstone.
You should not create tombstone :
1. Tombstone take up space and can substantially increase the amount of storage you require.
2. Querying tables with a large number of tombstones causes performance problems and it causes Latency and heap pressure.
The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there.
This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth.
Indeed, it is pretty easy to populate a row with null values:
INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null);
This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with:
INSERT INTO mytable (pk) VALUES (0x1234);
Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better.
Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra.
In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With