Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra TTL gets set to 0 on primary key if no TTL is specified on an update, but if it is, the TTL on the primary key does not change

This behavior in Cassandra seems counter-intuitive and I want to know why this is happening, and possibly work around this.


Imagine I have a table with three columns: pk, the primary key, a text type, foo, a bigint, and bar, another text.

insert into keyspace.table (pk, foo, bar) values ('first', 1, 'test') using ttl 60;

This creates a row in my table that has a time-to-live of 60 seconds. Looking at it, it looks like this:

  pk  | foo | bar
------------------
first |  1  | test

Now I do:

update keyspace.table using ttl 10 set bar='change' where pk='first';

And then, watching the row, I see it undergo the following changes:

  pk  | foo | bar
--------------------
first |  1  | change
first |  1  | <<null>>  // after 10 seconds
   << deleted >>        // after the initial 60 seconds

All well and good. What I wanted was for bar's time-to-live to change, but nothing else, especially not the primary key. This behavior was expected.


However, if my update doesn't have a ttl in it, or it's set to 0:

update keyspace.table set bar='change' where pk='first';

Then I see this behavior over time instead.

  pk  | foo | bar
--------------------
first |  1  | change
first |  0  | change   // after the initial 60 seconds

In other words, the row is never deleted. foo hadn't been changed, so its time-to-live was still in effect and after it passed the value was deleted (set to 0). But pk did have its time-to-live changed. This is totally unexpected.

Why does the primary key's time-to-live change only if I don't specify the time-to-live in the update? And how can I work around this so that the primary key's time-to-live will only change if I explicitly say to do so?

Edit I also found that if I use a time-to-live that's higher than the initial one it also seems to change the time-to-live on the primary key.

update keyspace.table using ttl 70 set bar='change' where pk='first';

  pk  | foo | bar
--------------------
first |  1  | change
first |  0  | change   // after the initial 60 seconds
   << deleted >>       // after the 70 seconds
like image 477
2rs2ts Avatar asked Dec 03 '14 19:12

2rs2ts


2 Answers

The effect that you are experiencing is caused by the storage model used by Cassandra.

In your example, where you have a table that does not have any clustering columns, each row in the table maps to a row in the data store (often called a "Thrift row", because this is the storage model exposed through the Thrift API). Each of the columns in your table that are not part of the primary key (so in your example the foo and the bar columns) is mapped to a column in the Thrift row. In addition to that, an extra column that is not visible in the CQL row is created as a marker that the row exists.

TTL expiration happens on the level of Thrift columns, not CQL columns. When you INSERT a row, all the columns that you insert as well as the special marker for the row itself get the same TTL.

If you UPDATE a row, only the columns that you update get a new TTL. The row marker is not touched.

When running a query with SELECT all rows for which at least one column or the special row marker exists are returned. This means that the column with the highest TTL defines how long a CQL row is visible, unless the marker for the row itself (which is only touched when using an INSERT statement) has a longer TTL.

If you want to ensure that the row's primary key gets updated with the same TTL as the new column values, the workaround is simple: Use the INSERT statement when updating a row. This will have exactly the same effect as using UPDATE, but it will also update the TTL of the row marker.

The only downside of this workaround is that it does not work in combination with lightweight transactions (IF clause in INSERT or UPDATE statements). If you need these in combination with a TTL, you have to use a more complex workaround, but this would be a separate question, I suppose.

If you want to update some columns of a row, but still want the whole row to disappear once the TTL that you specified when it was inserted originally expires, this is not directly supported by Cassandra. The only way would be to find out the TTL left for the row by first querying the TTL of one of the columns and then using this TTL in the UPDATE operation. For example, you could use SELECT TTL(foo) FROM table1 WHERE pk = 'first';. However, this has performance implications because it increases the latency (you have to wait for the result of SELECT before you can run the UPDATE).

As an alternative, you could add a column that you only use as a "row exists" marker and that you only touch during the INSERT and never in an UPDATE. You could then simply ignore rows for which this column is null, but this filtering would need to be implemented on the client side and it will not help if you cannot specifiy a TTL in an UPDATE because the updated columns would never be deleted.

like image 66
Sebastian Marsching Avatar answered Sep 22 '22 04:09

Sebastian Marsching


After some testing, those are the expected results. TTLs have the granularity of columns.

  • When doing an update, if no TTL is specified, the column TTL is set to 0. This operation doesn't affect other column TTLs.
  • We cannot update a column value and preserve the old column value TTL in a single cql command.
  • A row (or primary/partition key) is deleted when ALL column TTLs are expired. The row will not be deleted if a column has a TTL or 0.

As of today (Cassandra 2.1), Here is how you can update a column value and preserve its TTL:

SELECT TTL(col1) FROM table1 where pk=1;
// read the ttl value fetched.
UPDATE table1 USING TTL <the_ttl_value> set col1='change' where pk=1;
like image 39
Alan Boudreault Avatar answered Sep 19 '22 04:09

Alan Boudreault