Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using an RDBMS as event sourcing storage

People also ask

What database is used for Event Sourcing?

In summary, we can say that RDBMS can easily be used as an event store, since it satisfies all the requirements natively. MongoDB is one of the most popular among the NoSQL databases. Its main characteristics include high scalability via sharding and schema-less storage.

What is event store in Event Sourcing?

Entities as event streams So, Event Sourcing is the persistence mechanism where each state transition for a given entity is represented as a domain event that gets persisted to an event database (event store). When the entity state mutates, a new event is produced and saved.

What is the difference between CQRS and Event Sourcing?

CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.

Should you use Event Sourcing?

Event sourcing has several benefits: It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes. Because it persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.


The event store should not need to know about the specific fields or properties of events. Otherwise every modification of your model would result in having to migrate your database (just as in good old-fashioned state-based persistence). Therefore I wouldn't recommend option 1 and 2 at all.

Below is the schema as used in Ncqrs. As you can see, the table "Events" stores the related data as a CLOB (i.e. JSON or XML). This corresponds to your option 3 (Only that there is no "ProductEvents" table because you only need one generic "Events" table. In Ncqrs the mapping to your Aggregate Roots happens through the "EventSources" table, where each EventSource corresponds to an actual Aggregate Root.)

Table Events:
    Id [uniqueidentifier] NOT NULL,
    TimeStamp [datetime] NOT NULL,

    Name [varchar](max) NOT NULL,
    Version [varchar](max) NOT NULL,

    EventSourceId [uniqueidentifier] NOT NULL,
    Sequence [bigint], 

    Data [nvarchar](max) NOT NULL

Table EventSources:
    Id [uniqueidentifier] NOT NULL, 
    Type [nvarchar](255) NOT NULL, 
    Version [int] NOT NULL

The SQL persistence mechanism of Jonathan Oliver's Event Store implementation consists basically of one table called "Commits" with a BLOB field "Payload". This is pretty much the same as in Ncqrs, only that it serializes the event's properties in binary format (which, for instance, adds encryption support).

Greg Young recommends a similar approach, as extensively documented on Greg's website.

The schema of his prototypical "Events" table reads:

Table Events
    AggregateId [Guid],
    Data [Blob],
    SequenceNumber [Long],
    Version [Int]

The GitHub project CQRS.NET has a few concrete examples of how you could do EventStores in a few different technologies. At time of writing there is an implementation in SQL using Linq2SQL and a SQL schema to go with it, there's one for MongoDB, one for DocumentDB (CosmosDB if you're in Azure) and one using EventStore (as mentioned above). There's more in Azure like Table Storage and Blob storage which is very similar to flat file storage.

I guess the main point here is that they all conform to the same principal/contract. They all store information in a single place/container/table, they use metadata to identify one event from another and 'just' store the whole event as it was - in some cases serialised, in supporting technologies, as it was. So depending on if you pick a document database, relational database or even flat file, there's several different ways to all reach the same intent of an event store (it's useful if you change you mind at any point and find you need to migrate or support more than one storage technology).

As a developer on the project I can share some insights on some of the choices we made.

Firstly we found (even with unique UUIDs/GUIDs instead of integers) for many reasons sequential IDs occur for strategic reasons, thus just having an ID wasn't unique enough for a key, so we merged our main ID key column with the data/object type to create what should be a truly (in the sense of your application) unique key. I know some people say you don't need to store it, but that will depend on if you are greenfield or having to co-exist with existing systems.

We stuck with a single container/table/collection for maintainability reasons, but we did play around with a separate table per entity/object. We found in practise that meant either the application needed "CREATE" permissions (which generally speaking is not a good idea... generally, there's always exceptions/exclusions) or each time a new entity/object came into existence or was deployed, new storage containers/tables/collections needed to be made. We found this was painfully slow for local development and problematic for production deployments. You may not, but that was our real-world experience.

Another things to remember is that asking action X to happen may result in many different events occurring, thus knowing all the events generated by a command/event/what ever is useful. They may also be across different object types e.g. pushing "buy" in a shopping cart may trigger account and warehousing events to fire. A consuming application may want to know all of this, so we added a CorrelationId. This meant a consumer could ask for all events raised as a result of their request. You'll see that in the schema.

Specifically with SQL, we found that performance really became a bottleneck if indexes and partitions weren't adequately used. Remember events will needs to be streamed in reverse order if you are using snapshots. We tried a few different indexes and found that in practise, some additional indexes were needed for debugging in-production real-world applications. Again you'll see that in the schema.

Other in-production metadata was useful during production based investigations, timestamps gave us insight into the order in which events were persisted vs raised. That gave us some assistance on a particularly heavily event driven system that raised vast quantities of events, giving us information about the performance of things like networks and the systems distribution across the network.


Well you might wanna give a look at Datomic.

Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability, and ACID transactions.

I wrote a detailed answer here

You can watch a talk from Stuart Halloway explaining the design of Datomic here

Since Datomic stores facts in time, you can use it for event sourcing use cases, and so much more.


I think solution (1 & 2) can become a problem very quickly as your domain model evolves. New fields are created, some change meaning, and some can become no longer used. Eventually your table will have dozens of nullable fields, and loading the events will be mess.

Also, remember that the event store should be used only for writes, you only query it to load the events, not the properties of the aggregate. They are separate things (that is the essence of CQRS).

Solution 3 what people usually do, there are many ways to acomplish that.

As example, EventFlow CQRS when used with SQL Server creates a table with this schema:

CREATE TABLE [dbo].[EventFlow](
    [GlobalSequenceNumber] [bigint] IDENTITY(1,1) NOT NULL,
    [BatchId] [uniqueidentifier] NOT NULL,
    [AggregateId] [nvarchar](255) NOT NULL,
    [AggregateName] [nvarchar](255) NOT NULL,
    [Data] [nvarchar](max) NOT NULL,
    [Metadata] [nvarchar](max) NOT NULL,
    [AggregateSequenceNumber] [int] NOT NULL,
 CONSTRAINT [PK_EventFlow] PRIMARY KEY CLUSTERED 
(
    [GlobalSequenceNumber] ASC
)

where:

  • GlobalSequenceNumber: Simple global identification, may be used for ordering or identifying the missing events when you create your projection (readmodel).
  • BatchId: An identification of the group of events that where inserted atomically (TBH, have no idea why this would be usefull)
  • AggregateId: Identification of the aggregate
  • Data: Serialized event
  • Metadata: Other usefull information from event (e.g. event type used for deserialize, timestamp, originator id from command, etc.)
  • AggregateSequenceNumber: Sequence number within the same aggregate (this is usefull if you cannot have writes happening out of order, so you use this field to for optimistic concurrency)

However, if you are creating from scratch I would recomend following the YAGNI principle, and creating with the minimal required fields for your use case.


Possible hint is design followed by "Slowly Changing Dimension" (type=2) should help you to cover:

  • order of events occurring (via surrogate key)
  • durability of each state (valid from - valid to)

Left fold function should be also okay to implement, but you need to think of future query complexity.


I reckon this would be a late answer but I would like to point out that using RDBMS as event sourcing storage is totally possible if your throughput requirement is not high. I would just show you examples of an event-sourcing ledger I build to illustrate.

https://github.com/andrewkkchan/client-ledger-service The above is an event sourcing ledger web service. https://github.com/andrewkkchan/client-ledger-core-db And the above I use RDBMS to compute states so you can enjoy all the advantages coming with a RDBMS like transaction support. https://github.com/andrewkkchan/client-ledger-core-memory And I have another consumer to be processing in memory to handle bursts.

One would argue the actual event store above still lives in Kafka-- as RDBMS is slow for inserting especially when the inserting is always appending.

I hope the code help give you an illustration apart from the very good theoretical answers already provided for this question.