How to maintain data consistency across multiple tables in cassandra?

Question

I'm having trouble figuring out how to maintain attribute updates across multiple tables to ensure data consistency.

For example, suppose I have many-to-many relationship between actors and fans. A fan can support many actors, and an actor have many fans. I make several tables to support my queries

CREATE TABLE fans (
    fan_id uuid,
    fan_attr_1 int,
    fan_attr_2 int
    PRIMARY KEY ((fan_id))
)

CREATE TABLE actors (
    actor_id uuid,
    actor_attr_1 int,
    actor_attr_2 int
    PRIMARY KEY ((actor_id))
)

CREATE TABLE actors_by_fan (
    fan_id uuid,
    actor_id uuid,
    actor_attr_1 int,
    actor_attr_2 int
    PRIMARY KEY (fan_id, actor_id)
)

CREATE TABLE fans_by_actor (
    actor_id uuid,
    fan_id uuid,
    fan_attr_1 int,
    fan_attr_2 int
    PRIMARY KEY (actor_id, fan_id)
)

Let's say I'm a fan and I'm on my settings page and I want to change my fan_attr_1 to a different value.

On the fans table I can update my attribute just fine since the application knows my fan_id and can key on that.

However I cannot change my fan_attr_1 on the fans_by_actor without first querying for the actor_ids tied to the fan.

This problem occurs for any time you want to update any attribute of either fans or actors.

I've tried looking online for people experiencing similar problems, but I couldn't find them. For example, in Datastax's Data Modeling course they use the examples with actors and videos in a many to many relationship where they have tables actors_by_video and videos_by_actor. The course, like the other online resources I've consulted, discussed modeling tables after queries, but haven't dug into how to maintain data integrity. In the actors_by_video table, what would happen if I want to change an actor's attribute? Wouldn't have have to go through every row of actors_by_video to find the partitions that contain the actor and update the attribute? That sounds very inefficient. The other option is to look for the video id's beforehand, but I read elsewhere that reads before writes are an antipattern in Cassandra.

What would be the best approach for tackling this problem either from a data modeling standpoint or from a CQL standpoint?

EDIT: - Fixed sentence stubs - Added context and prior research

Clement Amarnath · Accepted Answer

Data Modeling

Cassandra is not an Relational Database and there are certain basic rules need to be followed on DataModeling, at high-level the following goals need to be followed for our data model.

1) Spread data evenly around the cluster

2) Minimize the number of partitions read

Moreover we should go for a single big table rather than breaking it into multiple tables and adding relationship between the tables. In this approach duplication of records will occur. Duplication of records is not a costlier operation since it takes only a little more Disk Space rather than CPU, memory, disk IOPs, or network.

Please note that there is a size restriction on column key names and values. The maximum column key (and row key) size is 64KB. The maximum column value size is 2 GB. But becuase there is no streaming and the whole value is fetched in heap memory when requested, limit the size to only a few MBs.

More Info:

http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html

CQL

Maintaining Consistency across tables can be done using Batch or Materialized Views. Materialized views is available from version 3.0

Please see

How to ensure data consistency in Cassandra on different tables?

My preference would be to change the data model and design it accordingly for our queries and if possible make it as a single big table.

Hope it Helps!

How to maintain data consistency across multiple tables in cassandra?

Tags:

data-modeling

cassandra

cql

data-consistency

James Maa

1 Answers

Clement Amarnath

Recent Activity

Donate For Us

How to maintain data consistency across multiple tables in cassandra?

Tags:

data-modeling

cassandra

cql

data-consistency

James Maa

1 Answers

Clement Amarnath

Related questions

Recent Activity

Donate For Us