Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra - How to retrieve most recent value

I have the following table defined in Cassandra 2.0.9:

CREATE TABLE history
(
    histid      uuid,
    ddate           text,       -- Day Date, i.e. 2014-11-20
    valtime         timestamp,  -- value time
    val             text,       --value
    PRIMARY KEY ((histid , ddate), valtime )
)
WITH CLUSTERING ORDER BY (valtime desc)
;

Scripts insert several thousand rows into this table daily.

I need to be able to select from this table knowing only the histid. However, i've partitioned the rows using (histid , ddate). Meaning, I have a full day of history values per row.

In order to select from this table for a particular histid, I also need to provide the ddate column. For example:

SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
;

To get the most recent value, I can do the following:

SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
LIMIT 1
;

However, If i want the most recent value for any given histid, I can't submit the query without knowing what ddate is, since it is part of the partition key.

So...I ask, what would be the best way to approach this?

This is what i've done, but i don;'t know if it's reasonable:

I've created a secondary table:

 CREATE TABLE history_date
(
    histid          uuid,
    maxdate         timestamp, -- most recent date
    PRIMARY KEY (histid)
);

When a row is inserted into the history table, a row is also inserted into this table using, (histid, valtime).

Our program code can then:

1.  query the history_date table for a particular id
2. take the "maxdate" column (truncate it to yyyy-mm-dd)
3. use the histid and truncated maxdate to query the history table to retrieve the most recent value.

So this works. But, it doesn't really feel like a good solution.

Is there a better way to do this, perhaps with just a single table?

Thanks for your time.

like image 783
iamtheoracle Avatar asked Jan 27 '15 15:01

iamtheoracle


1 Answers

One thing you could try, is to build a new table partitioned on a wider date range, such as month. This way, you just need to know the month to query.

CREATE TABLE history_by_month(
    histid          uuid,
    ddate           text,       -- Day Date, i.e. 2014-11-20
    valtime         timestamp,  -- value time
    val             text,       --value
    month           text,
    PRIMARY KEY (month, valtime, histid))
WITH CLUSTERING ORDER BY (valtime desc, histid asc);

Now, this query should return what you're looking for:

SELECT * FROM history_by_month
WHERE month = '2014-05'
LIMIT 1;

The only thing to keep in mind, is that if you get too many entries in a single month, you run into the chance of your partitions to be too large. If that becomes an issue, you might consider narrowing that focus to week, perhaps.

Also, anyone still on 2.0.9 should consider upgrading. Even the most recent patch level of 2.1 is much more stable.

like image 96
Aaron Avatar answered Sep 20 '22 23:09

Aaron