I have the following table defined in Cassandra 2.0.9:
CREATE TABLE history
(
histid uuid,
ddate text, -- Day Date, i.e. 2014-11-20
valtime timestamp, -- value time
val text, --value
PRIMARY KEY ((histid , ddate), valtime )
)
WITH CLUSTERING ORDER BY (valtime desc)
;
Scripts insert several thousand rows into this table daily.
I need to be able to select from this table knowing only the histid. However, i've partitioned the rows using (histid , ddate). Meaning, I have a full day of history values per row.
In order to select from this table for a particular histid, I also need to provide the ddate column. For example:
SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
;
To get the most recent value, I can do the following:
SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
LIMIT 1
;
However, If i want the most recent value for any given histid, I can't submit the query without knowing what ddate is, since it is part of the partition key.
So...I ask, what would be the best way to approach this?
This is what i've done, but i don;'t know if it's reasonable:
I've created a secondary table:
CREATE TABLE history_date
(
histid uuid,
maxdate timestamp, -- most recent date
PRIMARY KEY (histid)
);
When a row is inserted into the history table, a row is also inserted into this table using, (histid, valtime).
Our program code can then:
1. query the history_date table for a particular id
2. take the "maxdate" column (truncate it to yyyy-mm-dd)
3. use the histid and truncated maxdate to query the history table to retrieve the most recent value.
So this works. But, it doesn't really feel like a good solution.
Is there a better way to do this, perhaps with just a single table?
Thanks for your time.
One thing you could try, is to build a new table partitioned on a wider date range, such as month
. This way, you just need to know the month to query.
CREATE TABLE history_by_month(
histid uuid,
ddate text, -- Day Date, i.e. 2014-11-20
valtime timestamp, -- value time
val text, --value
month text,
PRIMARY KEY (month, valtime, histid))
WITH CLUSTERING ORDER BY (valtime desc, histid asc);
Now, this query should return what you're looking for:
SELECT * FROM history_by_month
WHERE month = '2014-05'
LIMIT 1;
The only thing to keep in mind, is that if you get too many entries in a single month, you run into the chance of your partitions to be too large. If that becomes an issue, you might consider narrowing that focus to week, perhaps.
Also, anyone still on 2.0.9 should consider upgrading. Even the most recent patch level of 2.1 is much more stable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With