I'm working on a CQRS/ES architecture. We run multiple asynchronous projections into the read stores in parallel because some projections might be much slower than others and we want to stay more in sync with the write side for the faster projections.
I'm trying to understand the approaches on how I can generate the read models and how much data-duplication this might entail.
Let's take an order with items as a simplified example. An order can have multiple items, each item has a name. Items and orders are separate aggregates.
I could either try to save the read models in a more normalized fashion, where I create an entity or document for each item and order and then reference them - or I maybe would like to save it in a more denormalized manner where I have an order which contains items.
Normalized
{
Id: Order1,
Items: [Item1, Item2]
}
{
Id: Item1,
Name: "Foosaver 9000"
}
{
Id: Item2,
Name: "Foosaver 7500"
}
Using a more normalized format would allow a single projection to process events that affect/effect item and orders and update the corresponding objects. It would also mean that any changes to the item name affect all orders. A customer might get a delivery note for different items than the corresponding invoice for example (so obviously that model might not be good enough and lead us to the same issues as denormalizing...)
Denormalized
{
Id: Order1,
Items: [
{Id: Item1, Name: "Foosaver 9000"},
{Id: Item2, Name: "Foosaver 7500"},
]
}
Denormalizing however would require some source where I can look up the current related data - such as the item. This means that I either have to transport all the information I might need in the event, or I'll have to keep track of the data that I source for my denormalization. This would also mean that I might have to do this once for each projection - i.e. I might need a denormalized ItemForOrder as well as a denormalized ItemForSomethingElse - both only containing the bare minimum properties that each of the denormalized entities or documents need (whenever they are created or modified).
If I would share the same Item in the read store, I could end up mixing item definitions from different points of time, because the projections for items and orders might not run at the same pace. In the worst case, the projection for items might not have yet created the item I need to source for its properties.
Generally, what approaches do I have when processing relationships from an event stream?
update 2016-06-17
Currently, I'm solving this by running a single projection per denormalised read model and its related data. If I have multiple read models that have to share the same related data, then I might put them into the same projection to avoid duplicating the same related data I need for the lookup.
These related models might even be somewhat normalised, optimised for however I have to access them. My projection is the only thing that reads and writes to them, so I know exactly how they are read.
// related data
public class Item
{
public Guid Id {get; set;}
public string Name {get; set;}
/* and whatever else is needed but not provided by events */
}
// denormalised info for document
public class ItemInfo
{
public Guid Id {get; set;}
public string Name {get; set;}
}
// denormalised data as document
public class ItemStockLevel
{
public ItemInfo Item {get; set;} // when this is a document
public decimal Quantity {get; set;}
}
// or for RDBMS
public class ItemStockLevel
{
public Guid ItemId {get; set;}
public string ItemName {get; set;}
public decimal Quantity {get; set;}
}
However, the more hidden issue here is that of when to update which related data. This is heavily dependent on the business process.
For example, I wouldn't want to change the item descriptions of an order after it has been placed. I must only update the data that changed according to the business process when the projection processes an event.
Therefore, the argument could be made towards putting this information into the event (and using the data as the client sent it?). If we find that we need additional data later, then we might have to fall back to projecting the related data from the event stream and read it from there...
This could be seen as a similar issue for pure CQRS architectures: when do you update the denormalised data in your documents? When do you refresh the data before presenting it to the user? Again, the business process might drive this decision.
Projections. In Event Sourcing, Projections (also known as View Models or Query Models) provide a view of the underlying event-based data model. Often they represent the logic of translating the source write model into the read model. They are used in both read models and write models.
Event sourcing has several benefits: It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes. Because it persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.
Good examples for Event Sourcing are version control systems that stores current state as diffs. The current state is your latest source code, and events are your commits.
This decouples the events from the storage mechanism, allowing them to be aggregated, or placed in a group with logical boundaries. Event Sourcing is one of the patterns that enables concurrent, distributed systems to achieve high performance, scalability and resilience.
First, I think you want to be careful in your aggregates about life cycles. In the usual shopping cart domain, the cart (Order) lifecycle spans that of the items. Udi Dahan wrote Don't Create Aggregate Roots, which I've found to mean that aggregates hold a reference to the aggregate that "created" them, rather than the other way around.
Therefore, I would expect the event history to look like
// Assuming Orders come from Customers
OrderCreated(orderId: Order1, customerId: Customer1)
ItemAdded(itemId: Item1, orderId: Order1, Name:"Foosaver 9000")
ItemAdded(itemId: Item2, orderId: Order1, Name:"Foosaver 7500")
Now, it's still the case that there are no guarantees here about ordering - that's going to depend on how the aggregates are designed in the write model, whether your event store linearizes events across different histories, and so on.
Notice that in your normalized views, you could go from the order to the items, but not the other way around. Processing the events I've described gives you that same limitation: instead of Orders with mysterious items, you have items with mysterious orders. Anybody who looks for an order either doesn't see it yet, sees it empty, or sees it with some number of items; and can follow links from those items to the key store.
Your normalized forms in your key value store don't need to change from your example; the projection responsible for writing the normalized form of orders needs to be smart enough to watch the item streams too, but its all good.
(Also note: we're eliding ItemRemoved here)
That's ok, but it misses on the idea that reads happen more often than writes. For hot queries, you are going to want the denormalized form available: the data in the store is the DTO that you are going to send in response to the query. For example, if the query were supporting a report on the order (no edits allowed), then you wouldn't need to send the item ids either.
{
Title: "Your order #{Order1}",
Items: [
{Name: "Foosaver 9000"},
{Name: "Foosaver 7500"}
]
}
One thing that you might consider is tracking the versions of the aggregates in question, so that when the user navigates from one view to the next -- rather than getting a stale projection, the query pauses waiting for the new projection to catch up.
For instance, if your DTO were hypermedia, then it might looks something like
{
Title: "Your order #{Order1}",
refreshUrl: /orders/Order1?atLeastVersion=20,
Items: [
{Name: "Foosaver 9000", detailsUrl: /items/Item1?atLeastVersion=7},
{Name: "Foosaver 7500", detailsUrl: /items/Item2?atLeastVersion=9}
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With