Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra cql: how to select the LAST n rows from a table

Tags:

cassandra

cql3

I want to verify that rows are getting added to the table. What cql statement would show the last n rows from the table below?

Table description below:

cqlsh:timeseries> describe table option_data;

CREATE TABLE option_data (
  ts bigint,
  id text,
  strike decimal,
  callask decimal,
  callbid decimal,
  maturity timestamp,
  putask decimal,
  putbid decimal,
  PRIMARY KEY ((ts), id, strike)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

cqlsh:timeseries>
like image 978
Ivan Avatar asked Oct 02 '14 20:10

Ivan


People also ask

How do I SELECT distinct rows in Cassandra?

Use the DISTINCT keyword to return only distinct (different) values of partition keys. The FROM clause specifies the table to query. You may want to precede the table name with the name of the keyspace followed by a period (.). If you do not specify a keyspace, Cassandra queries the current keyspace.

What is range query in Cassandra?

The range query algorithm is implemented in Apache Cassandra to deal with range and secondary index queries. As shown through this post, the algorithm determines an attribute called concurrency factor with the goal to limit the number of nodes queried to retrieve the results.

What is Default_time_to_live in Cassandra?

default_time_to_live. 0. Set this property in MapReduce scenarios when you have no control of TTL. The value of this property is a number of seconds. If it is set, Cassandra applies a default TTL marker to each column in the table, set to this value.


1 Answers

You didn't specify last n "by what".

To get the last N per id:

SELECT * FROM option_data WHERE ts=1 ORDER BY id DESC LIMIT N;

ORDER BY clause can only be applied to the second column in a compound primary key. If you need to query by time you will need to think about your data model a little more.

If your queries are most often "last N", you might consider writing something like this:

CREATE TABLE time_series (
    id text,
    t timeuuid,
    data text,
    PRIMARY KEY (id, t)
) WITH CLUSTERING ORDER BY (t DESC)

... where 'id' is your time series id. The CLUSTERING ORDER reverses the order of timeuuid 't', causing the cells to be stored in a natural order for your query.

With this, you would get the last five events as follows:

SELECT * FROM time_series WHERE id='stream id' LIMIT 5;

There is a lot of information out there for time series in Cassandra. I suggest reading some of the more recent articles on the matter.

like image 147
Adam Holmberg Avatar answered Sep 19 '22 18:09

Adam Holmberg