Using Cassandra as an event store

Tags:

cassandra

I want to experiment with using Cassandra as an event store in an event sourcing application. My requirements for an event store are quite simple. The event 'schema' would be something like this:

id: the id of an aggregate root entity
data: the serialized event data (e.g. JSON)
timestamp: when the event occurred
sequence_number: the unique version of the event

I am completely new to Cassandra so forgive me for my ignorance in what I'm about to write. I only have two queries that I'd ever want to run on this data.

Give me all events for a given aggregate root id
Give me all events for a given aggregate root if where sequence number is > x

My idea is to create a Cassandra table in CQL like this:

CREATE TABLE events (
  id uuid,
  seq_num int,
  data text,
  timestamp timestamp,
  PRIMARY KEY  (id, seq_num) );

Does this seem like a sensible way to model the problem? And, importantly, does using a compound primary key allow me to efficiently perform the queries I specified? Remember that, given the use case, there could be a large number of events (with a different seq_num) for the same aggregate root id.

My specific concern is that the second query is going to be inefficient in some way (I'm thinking about secondary indexes here...)

282

asked Oct 11 '13 15:10

DrewEaster

2 Answers

Your design seem to be well modeled in "cassandra terms". The queries you need are indeed supported in "composite key" tables, you would have something like:

query 1: select * from events where id = 'id_event';
query 2: select * from events where id = 'id_event' and seq_num > NUMBER;

I do not think the second query is going to be inefficient, however it may return a lot of elements... if that is the case you could set a "limit" of events to be returned. If that is possible you can use the limit keyword.

Using composite keys seems like a good match for your specific requirements. Using "secondary indexes" do not seem to bring much to the table... unless I miss something in your design/requirements.

HTH.

answered Sep 21 '22 19:09

emgsilva

What you've got is good, except in case of many events for a particular aggregate. One thing you could do is create a static column to hold "next" and "max_sequence". The idea being that the static columns would hold the current max sequence for this partition, and the "artificial id" for the next partition. You could then, say, store 100 or 1000 events per partition. What you've essentially done then is bucketed the events for an aggregate into multiple partitions. This would mean additional overhead for querying and storing, but at the same time protect against unbounded growth. You might even create a lookup for partitions for an aggregate. Really depends on your use case and how "clever" you want it to be.

answered Sep 17 '22 19:09

ashic

Related questions
                            
                                How to read data from Cassandra with R?
                            
                                What are the pros or cons of storing json as text vs blob in cassandra?
                            
                                Cassandra data model for simple messaging app
                            
                                Cassandra Java Driver- QueryBuilder API vs PreparedStatements
                            
                                Cassandra error - Order By only supported when partition key is restricted by EQ or IN
                            
                                Cassandra, mongodb or couchdb for Ruby on Rails [closed]
                            
                                SELECT Specific Value from map
                            
                                how to rapidly increment counters in Cassandra w/o staleness
                            
                                cassandra, select via a non primary key
                            
                                Server-side warning: Aggregation query used without partition key
                            
                                How to pass along username and password to cassandra in python
                            
                                What is virtual nodes. and how it is helping during partitioning in Cassandra
                            
                                Unable to start Cassandra: "node already exists"
                            
                                Create Cassandra table using cql3 with default TTL
                            
                                Cassandra CQLSH TEXT field limit on COPY FROM CSV (field larger than field limit (131072))
                            
                                Advantages of using cql over thrift
                            
                                New Cassandra project - Astyanax or Java Driver?
                            
                                Bad Request: No indexed columns present in by-columns clause with Equal operator : CQL error?
                            
                                Programmatically flush data to cassandra every time before cassandra shut down
                            
                                CQL3: How to retrieve the TTL when there is only a primary key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With