Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CQRS, Event Sourcing and Scaling

It's clear that system based on these patterns is easily scalable. But I would like to ask you, how exactly? I have few questions regarding scalability:

  1. How to scale aggregates? If I will create multiple instances of aggregate A, how to sync them? If one of the instances process the command and create an event, this event should be propagated to every instance of that agregate?
  2. Shouldn't be there some business logic present which instance of the agregate to request? So if I am issuing multiple commands which applies to aggregate A (ORDERS) and applies to one specific order, it make sense to deliver it to the same instance. Or?

In this article: https://initiate.andela.com/event-sourcing-and-cqrs-a-look-at-kafka-e0c1b90d17d8, they are using Kafka with a partitioning. So the user management service - aggregate is scaled but is subscribed only to specific partition of the topic, which contains all events of a particular user.

Thanks!

like image 840
Ondrej Tomcik Avatar asked Jun 06 '18 08:06

Ondrej Tomcik


People also ask

What is the difference between CQRS and Event Sourcing?

CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.

Is Event Sourcing scalable?

If any data needs to be corrected then a further event should be issued. This means that you can retrieve the state of data as it was understood at any point in the past. Event sourcing can also scale very well but this does depend on a number of design decisions on the event store.

What problem does CQRS solve?

CQRS allows you to define commands with enough granularity to minimize merge conflicts at the domain level (any conflicts that do arise can be merged by the command), even when updating what appears to be the same type of data.

What is the CQRS pattern?

Storage. CQRS stands for Command and Query Responsibility Segregation, a pattern that separates read and update operations for a data store. Implementing CQRS in your application can maximize its performance, scalability, and security.

What is the difference between Event Sourcing and CQRS?

Event Sourcing and CQRS. The CQRS pattern is often used along with the Event Sourcing pattern. CQRS-based systems use separate read and write data models, each tailored to relevant tasks and often located in physically separate stores. When used with the Event Sourcing pattern, the store of events is the write model,...

Why is CQRS system performance important?

For this reason, CQRS systems performance is important. Event Sourcing is a method shaped on the main idea of accumulating events that took place in our system. Objects that are one of the main parts of the system that have an identity are called entities. In systems developed with Event Sourcing, the latest status of the assets are not recorded.

What is the CQRS pattern?

CQRS was a product of its time and meant to be a stepping stone towards the ideas of Event Sourcing.” It is not a coincidence that Greg proposed the CQRS pattern at the same time as he introduced Event Sourcing to the public.

What is CQRS in Salesforce?

CQRS stands for Command Query Responsibility Segregation. It is a concept that can be tightly related to event sourcing. That’s why we will explore them both in this article.


2 Answers

How to scale aggregates?

  • choose aggregates carefully, make sure your commands spread reasonably among many aggregates. You don't want to have an aggregate that likely to receive high number of command from concurrent users.

  • Serialize commands sent to aggregate instance. This can be done with aggregate repository and command bus/queue. But for me, the simplest way is to make optimistic locking with aggregate versioning as described in this post by Michiel Rook

which instance of the agregate to request?

In our reSolve framework we are creating instance of aggregate on every command and don't keep it between requests. This works surprisingly fast - it is faster to fetch 100 events and reduce them to aggregate state, than to find a right aggregate instance in a cluster.

This approach is scalable, lets you go serverless - one lambda invocation per command and no shared state in between. Those rare cases when aggregate has too many events are solved by snapshots.

like image 126
Roman Eremin Avatar answered Sep 28 '22 04:09

Roman Eremin


How to scale aggregates?

The Aggregate instances are represented by their stream of events. Every Aggregate instance has its own stream of events. Events from one Aggregate instance are NOT used by other Aggregate instances. For example, if Order Aggregate with ID=1 creates an OrderWasCreated event with ID=1001, that Event will NEVER be used to rehydrate other Order Aggregate instances (with ID=2,3,4...).

That being said, you scale the Aggregates horizontally by creating shards on the Event store based on the Aggregate ID.

If I will create multiple instances of aggregate A, how to sync them? If one of the instances process the command and create an event, this event should be propagated to every instance of that agregate?

You don't. Each Aggregate instance is completely separated from other instances.

In order to be able to scale horizontally the processing of commands, it is recommended to load each time an Aggregate instance from the Event store, by replaying all its previously generated events. There is one optimization that you can do to boost performance: Aggregate snapshots, but it is recommended to do it only if it's really needed. This answer could help.

Shouldn't be there some business logic present which instance of the agregate to request? So if I am issuing multiple commands which applies to aggregate A (ORDERS) and applies to one specific order, it make sense to deliver it to the same instance. Or?

You assume that the Aggregate instances are running continuously on some servers' RAM. You could do that but such an architecture is very complex. For example, what happens when one of the servers goes down and it must be replaced by other? It's hard to determine what instances where living there and to restart them. Instead, you could have many stateless servers that could handle commands for any of the aggregate instances. When a command arrives, you identity the Aggregate ID, you load it from the Event store by replaying all its previous events and then it can execute the command. After the command is executed and the new events are persisted to the Event store, you can discard the Aggregate instance. The next command that arrives for the same Aggregate instance could be handled by any other stateless server. So, scalability is dictated only by the scalability of the Event store itself.

like image 38
Constantin Galbenu Avatar answered Sep 30 '22 04:09

Constantin Galbenu