Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserting null values into cassandra

I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT.

Is there any harm in doing so?

EDIT!!

There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304

like image 316
ArchitGarg Avatar asked Nov 21 '16 06:11

ArchitGarg


People also ask

Can we insert null value in Cassandra?

Indeed, it is pretty easy to populate a row with null values: INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null); Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood.

How do I change the null value in Cassandra?

update keyspace. table set ric=null where code='code'; You can also use null in insert statements, but if you omit the value it's the same as saying null so there's no point.

How do you check NOT null in Cassandra?

How do I query in cassandra for != null columns. Select * from tableA where id != null; Select * from tableA where name !=

What causes tombstones in Cassandra?

In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as a tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads, and will eventually allow the data to be dropped via compaction.


2 Answers

Inserting a null value creates a tombstone.
You should not create tombstone :
1. Tombstone take up space and can substantially increase the amount of storage you require.
2. Querying tables with a large number of tombstones causes performance problems and it causes Latency and heap pressure.

like image 165
Ashraful Islam Avatar answered Oct 21 '22 19:10

Ashraful Islam


The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there.

This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth.

Indeed, it is pretty easy to populate a row with null values:

INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null);

This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with:

INSERT INTO mytable (pk) VALUES (0x1234);

Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better.

Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra.

In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.

like image 35
xmas79 Avatar answered Oct 21 '22 18:10

xmas79