Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update denormalized data in Cassandra

Let's say, we have users who can comment videos and we want to show all comments by video with user's name. Also user may go to his profile page and change his name.

Based on Cassandra data modeling practices that was covered in this answer Cassandra denormalization datamodel, I've create such tables:

CREATE TABLE users (
   user_id UUID,
   first_name TEXT,
   last_name TEXT,
   PRIMARY KEY ((user_id))
); 

CREATE TABLE comments_by_video (
   video_id UUID,
   added_at TIMESTAMP,
   user_id UUID,
   comment TEXT,
   first_name TEXT,
   last_name TEXT,
   PRIMARY KEY ((video_id), added_at, user_id)
);

Looks awesome, we can get data that needed to show comments by video just by one query.

Now, let's consider such use case.

User created a lot of comments (like 10 000) and then decided to change his name. Should we update all comments to change his name? Is there a way to make it efficient?

like image 402
NikolayS Avatar asked Jun 21 '16 08:06

NikolayS


1 Answers

Congratulation, you just enter the relationnal database zone !

More seriously, this requirement is a pain with your model. Either you have to use the user_id to query last_name and first name in users table for every comments at reads, or you need to look accross all partitions and all comments to replace the first_name and last_name everywhere. There is no way to make it efficient.

However, let's try a naive approach. You could create a users table, a videos table and another table that store all comments of a user like this one:

CREATE TABLE users_videos_comment(
    user_id uuid,
    video_id uuid,
    time timestamp,
    comment text,
    PRIMARY KEY ((user_id,video_id), time)
);

This is efficient for your new requirement, for a user and a video you can get all comments, so you just have to query users to look for the name, but you loose the "one query for all comments in a video". Also, you have to store in users a list of video where a user commented and on videos a list of users that made comments. This is difficult to maintain and will ask some more code.

There are maybe better ways to do it, but remember with noSQL What you loose on writes, you gain on reads

If you don't mind doing a lot of writes to change the username, then keep it as it is. From this post, Cassandra seems better for writes anyway, so you should think of optimizing the reads.

With that in mind, we can add a field in users that list all comments made by a user. This way, you won't have to scan across comments_by_video to look for every comments made by a user. This add some complexity because for any comments made by a user, you have to make two writes (and make sure it is consistent). But you have both requirements satisfied.

Hope it helps

like image 77
Whitefret Avatar answered Nov 02 '22 10:11

Whitefret