Event sourcing: denormalizing relationships in projections

Tags:

I'm working on a CQRS/ES architecture. We run multiple asynchronous projections into the read stores in parallel because some projections might be much slower than others and we want to stay more in sync with the write side for the faster projections.

I'm trying to understand the approaches on how I can generate the read models and how much data-duplication this might entail.

Let's take an order with items as a simplified example. An order can have multiple items, each item has a name. Items and orders are separate aggregates.

I could either try to save the read models in a more normalized fashion, where I create an entity or document for each item and order and then reference them - or I maybe would like to save it in a more denormalized manner where I have an order which contains items.

Normalized

{
  Id: Order1,
  Items: [Item1, Item2]
}

{
  Id: Item1,
  Name: "Foosaver 9000"
}

{
  Id: Item2,
  Name: "Foosaver 7500"
}

Using a more normalized format would allow a single projection to process events that affect/effect item and orders and update the corresponding objects. It would also mean that any changes to the item name affect all orders. A customer might get a delivery note for different items than the corresponding invoice for example (so obviously that model might not be good enough and lead us to the same issues as denormalizing...)

Denormalized

{
  Id: Order1,
  Items: [
    {Id: Item1, Name: "Foosaver 9000"},
    {Id: Item2, Name: "Foosaver 7500"},
  ]
}

Denormalizing however would require some source where I can look up the current related data - such as the item. This means that I either have to transport all the information I might need in the event, or I'll have to keep track of the data that I source for my denormalization. This would also mean that I might have to do this once for each projection - i.e. I might need a denormalized ItemForOrder as well as a denormalized ItemForSomethingElse - both only containing the bare minimum properties that each of the denormalized entities or documents need (whenever they are created or modified).

If I would share the same Item in the read store, I could end up mixing item definitions from different points of time, because the projections for items and orders might not run at the same pace. In the worst case, the projection for items might not have yet created the item I need to source for its properties.

Generally, what approaches do I have when processing relationships from an event stream?

update 2016-06-17

Currently, I'm solving this by running a single projection per denormalised read model and its related data. If I have multiple read models that have to share the same related data, then I might put them into the same projection to avoid duplicating the same related data I need for the lookup.

These related models might even be somewhat normalised, optimised for however I have to access them. My projection is the only thing that reads and writes to them, so I know exactly how they are read.

// related data 
public class Item 
{
  public Guid Id {get; set;}
  public string Name {get; set;}
  /* and whatever else is needed but not provided by events */
}

// denormalised info for document
public class ItemInfo 
{
  public Guid Id {get; set;}
  public string Name {get; set;}
}

// denormalised data as document
public class ItemStockLevel
{
  public ItemInfo Item {get; set;} // when this is a document
  public decimal Quantity {get; set;}
}

// or for RDBMS
public class ItemStockLevel
{
  public Guid ItemId {get; set;}
  public string ItemName {get; set;}
  public decimal Quantity {get; set;}
}

However, the more hidden issue here is that of when to update which related data. This is heavily dependent on the business process.

For example, I wouldn't want to change the item descriptions of an order after it has been placed. I must only update the data that changed according to the business process when the projection processes an event.

Therefore, the argument could be made towards putting this information into the event (and using the data as the client sent it?). If we find that we need additional data later, then we might have to fall back to projecting the related data from the event stream and read it from there...

This could be seen as a similar issue for pure CQRS architectures: when do you update the denormalised data in your documents? When do you refresh the data before presenting it to the user? Again, the business process might drive this decision.

801

asked May 25 '16 12:05

urbanhusky

1 Answers

First, I think you want to be careful in your aggregates about life cycles. In the usual shopping cart domain, the cart (Order) lifecycle spans that of the items. Udi Dahan wrote Don't Create Aggregate Roots, which I've found to mean that aggregates hold a reference to the aggregate that "created" them, rather than the other way around.

Therefore, I would expect the event history to look like

// Assuming Orders come from Customers
OrderCreated(orderId: Order1, customerId: Customer1)

ItemAdded(itemId: Item1, orderId: Order1, Name:"Foosaver 9000")

ItemAdded(itemId: Item2, orderId: Order1, Name:"Foosaver 7500")

Now, it's still the case that there are no guarantees here about ordering - that's going to depend on how the aggregates are designed in the write model, whether your event store linearizes events across different histories, and so on.

Notice that in your normalized views, you could go from the order to the items, but not the other way around. Processing the events I've described gives you that same limitation: instead of Orders with mysterious items, you have items with mysterious orders. Anybody who looks for an order either doesn't see it yet, sees it empty, or sees it with some number of items; and can follow links from those items to the key store.

Your normalized forms in your key value store don't need to change from your example; the projection responsible for writing the normalized form of orders needs to be smart enough to watch the item streams too, but its all good.

(Also note: we're eliding ItemRemoved here)

That's ok, but it misses on the idea that reads happen more often than writes. For hot queries, you are going to want the denormalized form available: the data in the store is the DTO that you are going to send in response to the query. For example, if the query were supporting a report on the order (no edits allowed), then you wouldn't need to send the item ids either.

{
    Title: "Your order #{Order1}",
    Items: [
        {Name: "Foosaver 9000"},
        {Name: "Foosaver 7500"}
    ]
}

One thing that you might consider is tracking the versions of the aggregates in question, so that when the user navigates from one view to the next -- rather than getting a stale projection, the query pauses waiting for the new projection to catch up.

For instance, if your DTO were hypermedia, then it might looks something like

{
    Title: "Your order #{Order1}",
    refreshUrl: /orders/Order1?atLeastVersion=20,
    Items: [
        {Name: "Foosaver 9000", detailsUrl: /items/Item1?atLeastVersion=7},
        {Name: "Foosaver 7500", detailsUrl: /items/Item2?atLeastVersion=9}
    ]
}

119

answered Oct 31 '22 03:10

VoiceOfUnreason

Related questions
                            
                                ggplot of Dymaxion/Airocean (icosahedral) map projection
                            
                                how do I re-project points in a camera - projector system (after calibration)
                            
                                Projection Matrix for Pseudo Cylindrical Projection
                            
                                Z Value after Perspective Divide is always less than -1
                            
                                Get Longitude Laltitude of a point in my Worldmap in Mollweide projection
                            
                                Conditionally include a field (_id or other) in mongodb project aggregation?
                            
                                spring jpa nested projection generating incorrect query
                            
                                OpenLayers Google Maps Projection Problem w/ KML
                            
                                Spring Data Projection with OneToMany returns too many results
                            
                                How to modify only one or two field(s) in LINQ projections?
                            
                                Project Scipy Voronoi diagram from 3d to 2d
                            
                                back-projecting a 2D point to 3D Plucker line
                            
                                Transforming coordinates of feature in openlayers
                            
                                How can I determine if a point is hidden on a projection?
                            
                                Hibernate Projections List
                            
                                Given an array of records, how can I get an array representing a field from each one?
                            
                                IQueryable Lambda Projection Syntax
                            
                                opengl oblique projection
                            
                                Converting EPSG projection bounds to a D3.js map
                            
                                Convert lat/lon to pixel coordinate?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Event sourcing: denormalizing relationships in projections

Tags:

projection

denormalization

cqrs

event-sourcing

urbanhusky

People also ask

1 Answers

VoiceOfUnreason

Recent Activity

Donate For Us