Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

modelling cassandra tables for upsert and select query

I have designed following table to store server alarms:

create table IF NOT EXISTS host_alerts(
    unique_key text,
    host_id text,
    occur_time timestamp,
    clear_time timestamp,
    last_occur timestamp,
    alarm_name text,
    primary key (unique_key,host_id,clear_time)
);

Let's enter some data:

truncate host_alerts;

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:01:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'2015-07-01 00:02:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

The query my application will be running are:

//All alarms which are **not cleared** for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time = '1970-01-01 00:00:00+0530';

//All alarms which are  cleared for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time > '2015-07-01 00:00:00+0530';

//All alarms between first occurrence
select * from host_alerts where  host_id = 'server-1' 
and occur_time > '2015-07-01 00:02:00+0530'and occur_time < '2015-07-01 00:05:00+0530';

I don't know if I should prepare more table example : host_alerts_by_hostname or host_alerts_by_cleartime and so on or simply add clustering index. As unique id is the only unique column but I need to retrive data from other column

Not cleared alarms: '1970-01-01 00:00:00+0530' cleared event has some date value.

host_id is server name

occur_time is when event has occured.

last_occur is time when event again reoccured.

alarm_name is what has happend with system.

How can i model my table so that I can perform these query and update based on unique_id? With what I have tried select is not possible and during upsert new row is created for same unique_key.

like image 914
kinkajou Avatar asked Aug 08 '15 10:08

kinkajou


1 Answers

I think you probably need three tables to support your three query types.

The first table would support time range queries about the history of when alerts happened for each host:

CREATE TABLE IF NOT EXISTS host_alerts_history (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, occur_time)
);

SELECT * FROM host_alerts_history WHERE host_id = 'server-1' AND occur_time > '2015-08-16 10:05:37-0400';

The second table would keep track of the uncleared alarms for each host:

CREATE TABLE IF NOT EXISTS host_uncleared_alarms (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, alarm_name)
);

SELECT * FROM host_uncleared_alarms WHERE host_id = 'server-1';

The last table would keep track of when alerts were cleared for each host:

CREATE TABLE IF NOT EXISTS host_alerts_by_cleartime (
    host_id text,
    clear_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, clear_time)
);

SELECT * FROM host_alerts_by_cleartime WHERE host_id = 'server-1' AND clear_time > '2015-08-16 10:05:37-0400';

When a new alarm event arrives, you'd execute this batch:

BEGIN BATCH
INSERT INTO host_alerts_history (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
INSERT INTO host_uncleared_alarms (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

Note that the insert into the uncleared table is an upsert, since the timestamp is not part of the key. So that table will only have one entry for each alarm name with a timestamp of the last occurrance.

When an alarm clear event arrives, you'd execute this batch:

BEGIN BATCH
DELETE FROM host_uncleared_alarms WHERE host_id = 'server-1' AND alarm_name = 'disk full';
INSERT INTO host_alerts_by_cleartime (host_id, clear_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

I didn't really understand what your "unique_key" is or where it comes from. I'm not sure it is needed since the combination of host_id and alarm_name should be the level of granularity you want to work with. Adding another unique key into the mix could give rise to a lot of unmatched alert/clear events. If unique_key is an alarm id, then use that as the key in place of alarm_name in my example and have alarm_name as a data column.

To prevent your tables from filling up over time with old data, you could use the TTL feature to automatically delete rows after several days.

like image 107
Jim Meyer Avatar answered Sep 19 '22 04:09

Jim Meyer