Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Schema for a Chat Application

Tags:

cassandra

cql

I have gone though this article and here is the schema I have got from it. This is helpful for my application for maintaining statuses of a user, but how can I extend this to maintain one to one chat archive and relations between users, relations mean people belong to specific group for me. I am new to this and need an approach for this.

Requirements :

  • I want to store messages between user-user in a table.
  • Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
  • I want to retrieve all the messages from different users to the user when user has requested.
  • And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.

This is what I have tried so far:

CREATE TABLE chatarchive (
   chat_id uuid PRIMARY KEY,
   username text,
   body text
)

CREATE TABLE chatseries (
username text,
    time timeuuid,
    chat_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)

CREATE TABLE chattimeline (
    to text,
username text,
    time timeuuid,
    chat_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)

Below is the schema that I currently have:

CREATE TABLE users (
   username text PRIMARY KEY,
   password text
)    

CREATE TABLE friends (
    username text,
    friend text,
    since timestamp,
    PRIMARY KEY (username, friend)
)

CREATE TABLE followers (
    username text,
    follower text,
    since timestamp,
    PRIMARY KEY (username, follower)
)

CREATE TABLE tweets (
    tweet_id uuid PRIMARY KEY,
    username text,
    body text
)
CREATE TABLE userline (
    username text,
    time timeuuid,
    tweet_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)

CREATE TABLE timeline (
    username text,
    time timeuuid,
    tweet_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
like image 554
Exception Avatar asked Jun 12 '14 05:06

Exception


2 Answers

With C* you need to store data in the way you'll use it. So let's see how this would look like for this case:

  • I want to store messages between user-user in a table.
  • Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.

    CREATE TABLE chat_messages (
        message_id uuid,
        from_user text,
        to_user text,
        body text,
        class text,
        time timeuuid,
        PRIMARY KEY ((from_user, to_user), time)
    ) WITH CLUSTERING ORDER BY (time ASC);
    

This will allow you to retrieve a timeline of messages between two users. Note that a composite primary key is used so that wide rows are created for each pair of users.

SELECT * FROM chat_messages WHERE from_user = 'mike' AND to_user = 'john' ORDER BY time DESC ;

  • I want to retrieve all the messages from different users to the user when user has requested.

CREATE INDEX chat_messages_to_user ON chat_messages (to_user);

This allows you to do:

SELECT * FROM chat_messages WHERE to_user = 'john';
  • And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.

CREATE INDEX chat_messages_class ON chat_messages (class);

This will allow you to do:

SELECT * FROM chat_messages WHERE class = 'family';

Note that in this kind of database, DENORMALIZED DATA IS A GOOD PRACTICE. This means that using the name of the class again and again is not a bad practice.

Also note that I haven't used a 'chat_id' nor a 'chats' table. We could easily add this but I feel that your use case didn't require it as it has been put forward. In general, you cannot do joins in C*. So, using a chat id would imply two queries.

EDIT: Secondary indexes are inefficient. A materialised view will be a better implementation with C* 3.0

like image 148
joscas Avatar answered Nov 13 '22 01:11

joscas


There is a chat application created by Alan Chandler on github that has the features you request:

  • MBchat

It uses a 2-phase authentication. First the user is validated in the forums and then, the user is validated on the chat database.

Here's the first validation part of the schema (schema located in inc/user.sql):

BEGIN;

CREATE TABLE users (
  uid integer primary key autoincrement NOT NULL,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  name character varying NOT NULL,
  role text NOT NULL DEFAULT 'R',      -- A (CEO), L (DIRECTOR), G (DEPT HEAD), H (SPONSOR) R(REGULAR)
  cap integer DEFAULT 0 NOT NULL,      -- 1 = blind, 2 = committee secretary, 4 = admin, 8 = mod, 16 = speaker 32 = can't whisper( OR of capabilities).
  password character varying NOT NULL, -- raw password
  rooms character varying,             -- a ":" separated list of rooms nos which define which rooms the user can go in
  isguest boolean DEFAULT 0 NOT NULL
);
CREATE INDEX userindex ON users(name);
-- Below here you can add the specific users for your set up in the form of INSERT Statements

-- This list is test users to cover the complete range of functions. Note names are converted to lowercase, so only put lowercase names in here
INSERT INTO users(uid,name,role,cap,password,rooms,isguest) VALUES
(1,'alice','A',4,'password','7',0),     -- CEO class user alice
(2,'bob','L',3,'password','8',0),       -- DIRECTOR class user bob 
(3,'carol','G',2,'password','7:8:9',0), -- DEPT HEAD class user carol



And here's the second validation part of the schema (schema located in data/chat.sql):

CREATE TABLE users (
  uid integer primary key NOT NULL,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  name character varying NOT NULL,
  role char(1) NOT NULL default 'R',
  rid integer NOT NULL default 0,
  mod char(1) NOT NULL default 'N',
  question character varying,
  private integer NOT NULL default 0,
  cap integer NOT NULL default 0,
  rooms character_varying 
);



The following is the schema of the chat rooms you can see the user classes and the examples of it:

CREATE TABLE rooms (
  rid integer primary key NOT NULL,
  name varchar(30) NOT NULL,
  type integer NOT NULL -- 0 = Open, 1 = meeting, 2 = guests can't speak, 3 moderated, 4 members(adult) only, 5 guests(child) only, 6 creaky door
) ;

INSERT INTO rooms (rid, name, type) VALUES 
(1, 'The Forum', 0),
(2, 'Operations Gallery', 2),  -- Guests Can't Speak
(3, 'Dungeon Club', 6),        -- creaky door
(4, 'Auditorium', 3),          -- Moderated Room
(5, 'Blue Room', 4),           -- Members Only (in Melinda's Backups this is Adults)
(6, 'Green Room', 5),          -- Guest Only (in Melinda's Backups this is Juveniles AKA Baby Backups)
(7, 'The Board Room', 1),      -- Various meeting rooms - need to be on users room list



The users have another table to indicate the participation of the conversation:

CREATE table wid_sequence ( value integer);
INSERT INTO wid_sequence (value) VALUES (1);

CREATE TABLE participant (
  uid integer NOT NULL REFERENCES users (uid) ON DELETE CASCADE ON UPDATE CASCADE,
  wid integer NOT NULL,
  primary key (uid,wid)
);



And the archives are recorded as follows:

CREATE TABLE chat_log (
  lid integer primary key,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  uid integer NOT NULL REFERENCES user (uid) ON DELETE CASCADE ON UPDATE CASCADE,
  name character varying NOT NULL,
  role char(1) NOT NULL,
  rid integer NOT NULL,
  type char(2) NOT NULL,
  text character varying
);

Edit: However this type of data modeling is not very suitable for Cassandra. Because, in Cassandra your data does not fit on one machine so joins are not available. So, in Cassandra denormalizing data is the practical choice. Check below for the denormalized version of chat_log table:

CREATE TABLE chat_log (
  lid uuid,
  time timestamp,
  sender text NOT NULL,
  receiver text NOT NULL,
  room text NOT NULL,
  sender_role varchar NOT NULL,
  receiver_role varchar NOT NULL,
  rid decimal NOT NULL,
  status varchar NOT NULL,
  message text,
  PRIMARY KEY (sender, receiver, room)
  -- PRIMARY KEY (sender, receiver) if you don't want the messages to be separated by the rooms
) WITH CLUSTERING ORDER BY (time ASC);

Now in order to retrieve data you'd use the following queries:

Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.

SELECT * FROM chat_log WHERE sender = 'bob' ORDER BY time ASC

I want to retrieve all the messages from different users to the user when user has requested.

SELECT * FROM chat_log WHERE receiver = 'alice' ORDER BY time ASC

I want to store and retrieve class of users.

SELECT * FROM chat_log WHERE sender_role = 'A' ORDER BY time ASC -- messages sent by CEOs

SELECT * FROM chat_log WHERE receiver_role = 'A' ORDER BY time ASC -- messages received by CEOs


After modeling the data. You'd need to create indexes for quick and efficient querying as follows:

  • For retrieving all messages from different users to the user efficiently

CREATE INDEX chat_log_uid ON chat_log (sender);
CREATE INDEX chat_log_uid ON chat_log (receiver);

  • For retrieving all messages from user classes efficiently

CREATE INDEX chat_log_class ON chat_log (sender_role);
CREATE INDEX chat_log_class ON chat_log (receiver_role);


I believe these examples will give you the approach you need.

If you'd like to learn more about Cassandra data modeling you can check down below:

  • Cassandra Data Modeling Best Practices, Part 1
  • Cassandra Data Modeling Best Practices, Part 2
  • Cassandra Data Modeling Best Practices Slide
  • Data Modeling Example
like image 33
Tamer Tas Avatar answered Nov 13 '22 00:11

Tamer Tas