Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter Cassandra result based on WRITETIME

Tags:

cassandra

cql

I would like to get values, whose WRITETIME value is newer than a certain time. I tried this query, but it fails:

SELECT zoom,idx FROM tiles
WHERE zoom=5 AND writetime(tile) > maxTimeuuid('2015-01-01 00:05+0000')
ALLOW FILTERING;

I get this error:

SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] 
    message="line 1:68 no viable alternative at input '(' (...and idx > 0 
    and [writetime](...)">

For this table:

CREATE TABLE tiles (
    zoom int,
    idx int,
    tile blob,
    PRIMARY KEY (zoom, idx)
) WITH COMPACT STORAGE
like image 268
Yuri Astrakhan Avatar asked Jul 02 '15 12:07

Yuri Astrakhan


1 Answers

WRITETIME is a function used for displaying the time a specific column was written. It is not a part of the PRIMARY KEY, nor is it indexed, so it cannot be used in your WHERE clause. To be able to query by the time a particular row (not column) was written, you should add that to your table as an additional column and as your first clustering key:

CREATE TABLE tilesByLastWritten (
    zoom int,
    idx int,
    tile blob,
    lastwritten timeuuid,
    PRIMARY KEY (zoom, lastwritten, idx)
) WITH CLUSTERING ORDER BY (lastwritten DESC, idx ASC);

Now this query will work:

aploetz@cqlsh:stackoverflow2> SELECT * FROM tilesByLastWritten 
    WHERE zoom=5 AND lastwritten > mintimeuuid('2015-07-02 08:30:00-0500');

 zoom | lastwritten                          | idx | tile
------+--------------------------------------+-----+------
    5 | 3a439c60-20bf-11e5-b9cb-21b264d4c94d |   1 | null

(1 rows)

Notes:

  • Don't use the ALLOW FILTERING directive. Basically this tells Cassandra that it's ok to pull all of your table's rows from all of your nodes, and then apply your filters.
  • Don't use COMPACT STORAGE on table creation. This was specifically designed for people to convert new CQL3 tables to the legacy Thrift engine storage format. If you're not doing specifically that, then you shouldn't use it.
  • I specified the CLUSTERING ORDER in my example to sort the tiles table by lastwritten in DESCending order. Usually, timeseries-based applications care about getting the most-recent data, so this usually makes sense. If this is not the case for you, then the (default) ASCending order should be fine.
  • In my example I included idx as the last clustering key, mainly for uniqueness. If you find yourself having to build queries for that column, you may need a different query table (with a re-arranged primary key) to support that.

For more help in this area, give Patrick McFadin's Getting Started With Timeseries Data Modeling a read.

like image 78
Aaron Avatar answered Sep 28 '22 15:09

Aaron