Select 2000 most recent log entries in cassandra table using CQL (Latest version)

Tags:

1 Answers

I would recommend that you design your table a bit differently. It would be rather hard to achieve what you're asking for with the design you have currently.

At the moment each of your entries in the audit_event table will receive another uuid, internally Cassandra will create many short rows. Querying for such rows is inefficient, and additionally they are ordered randomly (unless using Byte Ordered Partitioner, which you should avoid for good reasons).

However Cassandra is pretty good at sorting columns. If (back to your example) you declared your table like this :

CREATE TABLE mystuff(
  yymmddhh varchar, 
  created timeuuid,  
  stuff text, 
  PRIMARY KEY(yymmddhh, created)
);

Cassandra internally would create a row, where the key would be the hour of a day, column names would be the actual created timestamp and data would be the stuff. That would make it efficient to query.

Consider you have following data (to make it easier I won't go to 2k records, but the idea is the same):

insert into mystuff(yymmddhh, created, stuff) VALUES ('13081615', now(), '90');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081615', now(), '91');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081615', now(), '92');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081615', now(), '93');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081615', now(), '94');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081616', now(), '95');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081616', now(), '96');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081616', now(), '97');
insert into mystuff(yymmddhh, created, stuff) VALUES ('13081616', now(), '98');

Now lets say that we want to select last two entries (let's a assume for the moment that we know that the "latest" row key to be '13081616'), you can do it by executing query like this:

SELECT * FROM mystuff WHERE yymmddhh = '13081616' ORDER BY created DESC LIMIT 2 ;

which should give you something like this:

 yymmddhh | created                              | stuff
----------+--------------------------------------+-------
 13081616 | 547fe280-067e-11e3-8751-97db6b0653ce |    98
 13081616 | 547f4640-067e-11e3-8751-97db6b0653ce |    97

to get next 2 rows you have to take the last value from the created column and use it for the next query:

SELECT * FROM mystuff WHERE  yymmddhh = '13081616' 
AND created < 547f4640-067e-11e3-8751-97db6b0653ce 
ORDER BY created DESC LIMIT 2 ;

If you received less rows than expected you should change your row key to another hour.

Row key handling / calculation

For now I've assumed that we know the row key with which we want to query the data. If you log a lot of information I'd say that's not the problem - you can take just current time and issue a query with the hour set to what hour we have now. If we run out of rows we can subtract one hour and issue another query.

However if you don't know where your data lies, or if it's not distributed evenly, you can create metadata table, where you'd store the information about the row keys:

CREATE TABLE mystuff_metadata(
  yyyy varchar, 
  yymmddhh varchar, 
  PRIMARY KEY(yyyy, yymmddhh)
) WITH COMPACT STORAGE;

The row keys would be organized by a year, so to get the latest row key from the current year you'd have to issue a query:

SELECT yymmddhh 
FROM  mystuff_metadata where yyyy = '2013' 
ORDER BY yymmddhh DESC LIMIT 1;

Your audit software would have to make an entry to that table on start and later on each hour change (for example before inserting data to mystuff).

148

answered Dec 01 '22 17:12

lpiepiora

Related questions
                            
                                Nested query not working in Cassandra
                            
                                IN clause with Spring Data and Cassandra @Query
                            
                                Cassandra commit log clarification
                            
                                NoSpamLogger.java Maximum memory usage reached Cassandra
                            
                                Cassandra replication factor when have multiple data centres
                            
                                How to know affected rows in Cassandra(CQL)?
                            
                                Unable to retrieve data from Cassandra
                            
                                Why not make every node a seed node
                            
                                How to completely clear down, reset and restart a Cassandra cluster?
                            
                                why is scaling writes to a relational database virtually impossible?
                            
                                What is the best way to distribute postgresql
                            
                                Looking for a basic and up-to-date Cassandra tutorial [closed]
                            
                                Inserting arbitrary columns in Cassandra using CQL3
                            
                                Master-less model in Cassandra vs master-slave model in MongoDB?
                            
                                Hadoop on cassandra database
                            
                                Is there a stable Cassandra library for Erlang?
                            
                                How to remove dead node out of the Cassandra cluster?
                            
                                Cassandra Non-Counter Family
                            
                                How to make a post request with the Python requests library?
                            
                                Which database to choose (Cassandra, MongoDB, ?) for storing and querying event / log / metrics data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select 2000 most recent log entries in cassandra table using CQL (Latest version)

Tags:

cassandra

cql

Jay

People also ask

1 Answers

Row key handling / calculation

lpiepiora

Recent Activity

Donate For Us