Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Range Queries

Tags:

cassandra

I'm new to Cassandra and trying out data modelling and range queries.

For learning purpose I want to develop a database where I can store log lines with their LogType and Log generation time. Where I have to answer below query:

Find loglines by LogType between date range.

I Model my database as 2 column families: 1) Log

create column family log with comparator = 'UTF8Type' 
and key_validation_class = 'LexicalUUIDType'
and column_metadata=[{column_name: block, validation_class: UTF8Type}];

where I'm planning to store log lines with their logid's

ex: set log['7561a442-24e2-11df-8924-001ff3591711'][blocks]='someText|11-17-2011 23:40:42|sometext';

2)

create column family ltype with column_type = 'Super'
and comparator = 'TimeUUIDType'
and subcomparator = 'UTF8Type'
and column_metadata=[{column_name: id, validation_class: LexicalUUIDType}];

In this column family I will store the log type along with time and the log line id from log column family:

ex: set ltype[ltype1][12307245916538][id]='7561a442-24e2-11df-8924-001ff3591711';

I want to get the results when given type of Log and date range.

Can someone guide me how to run a range query on super column family?

like image 265
user1071714 Avatar asked Nov 29 '11 17:11

user1071714


People also ask

Does Cassandra support range queries?

No Range Queries in Cassandra.

What is range slice in Cassandra?

Certain types of query in Cassandra will lead to performing an expensive operation known as a range slice. Under some circumstances, range slices can cause high latency, long GC pauses, and node instability. This article provides advice for identifying and minimising the impact of range slices.

How do I add a secondary index in Cassandra?

Using CQL to create a secondary index on a column after defining a table. Using CQL, you can create an index on a column after defining a table. You can also index a collection column. Secondary indexes are used to query a table using a column that is not normally queryable.


1 Answers

An article on time series data modelling in Cassandra:

http://rubyscale.com/2011/basic-time-series-with-cassandra/

For time series, you really want to do larger rows - probably in the neighborhood of 10k-50k columns per row as a starting point (depending on your load). You can avoid super columns completely if you make the key a function of the a "date bucket":

[datetime]_[5 second interval] (granularity again depending on load)

This way your keys can be re-created, and you are just issuing a multi_get with the keys for the buckets you want.

A more general overview of data modeling:

http://www.datastax.com/docs/0.8/ddl/index

like image 147
zznate Avatar answered Sep 27 '22 21:09

zznate