I need to handle distributed transactions in a microservice architecture. In theory, one of the best ways of doing that is using the Saga Orchestration pattern. The problem is I could not find any detailed information about how to provide scalability.
Let's use the example below. There can be many CreateOrderSaga, if I have multiple OrderService.API and it will be the case. Because I can have more than one OrderService.API. Then if CreateOrderSaga is kind of a state machine, then does it mean it should handle all the steps in it by itself or other coordinators can take its job?
Then what if that one API crashes while running the saga process, can other saga coordinators continue to run with the same state where the crashed API left? What's the best way of handling this situation? How an event storing can help ?
Let me explain in detail
Coordinator1 in one of the Order.APIs, starts the CreateOrderSaga
CreateOrderSaga in Coordinator1 creates an order which has pending state
Then Coordinator1 crashed for some reason. (maybe electricity is gone) the order stayed as pending state and no one is interested now. Someone should continue to process it or should mark it as failed (who has the responsibility) Maybe some compensation transactions are needed also.
So is it ok to make a saga coordinator starts a process but others can also continue to process it?
How a saga coordinator can be scaled up?
Solution:
I did choose Masstransit to manage distributed transactions
Microservices are often deployed across multi-cloud environments, resulting in increased risk and loss of control and visibility of application components—resulting in additional vulnerable points.
The main benefit of the Saga Pattern is that it helps maintain data consistency across multiple services without tight coupling. This is an extremely important aspect for a microservices architecture. However, the main disadvantage of the Saga Pattern is the apparent complexity from a programming point of view.
Although microservices offer many advantages, they also come with a higher degree of complexity. This complexity can be a major challenge for organizations that are not used to working with microservices. Additionally, because microservices are so independent, it can be difficult to track down errors and resolve them.
The Saga design pattern is a way to manage data consistency across microservices in distributed transaction scenarios. A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step.
Saga pattern should be implemented as asynchronous process. In this case asynchronous
means messaged based. Most types of message queues have acknowledgement feature (for rabbitmq). Here I'm going to describe stateless services (i.e. it's OK to handle CreateOrder
requests in different instances of OrderService
).
You click "make an order" button, CreateOrder
message is sent into message queue, OrderService
receives this messages from the queue. It's scaled because you can create many instances of OrderService
.
Then we have two cases:
1.
Orchestration-based saga: OrderService
receives a message, instantiates coordinator, coordinator consumes CustomerService
. If OrderService
fails before completion of message processing, CreateMessage
message won't be acknowledged in message queue. Subseequently, another instance of OrderService
will receive the message and try to process it. If CustomerService
fails during the call: you can fail the entire CreateOrder
message and retry it later or retry the particular call to CustomerService
.
2.
Choreography-based saga: OrderService
receives a message and tries to process the message. If it fails then situation is the same: the message won't be acknowledged and will be redelivered for the next retry later. This approach is about emitting events like OrderCreated
, CustomerCreated
etc. (i.e. it's event-oriented)
Of course you should configure monitoring and alerting for your services to be sure that system is alive and able to process messages.
Also you should consider whether you need to implement some compensation logic or checks. Imagine: you make two HTTP POST requests to different services while processing a message, 1st service call is completed successfully but 2nd one fails. If you retry entire CreateOrder
message - you should not call the 1st service again.
Further reading: overview of sagas, coordinating sagas, choreography-based sagas:
In order for the communication to be reliable, it’s essential that the saga participants use a message broker that guarantees at-least-once delivery and has durable subscriptions. That’s because at-least-once delivery and durable subscriptions ensure that a saga completes even if a participant is temporarily unavailable. A message will sit in the message broker’s channel (e.g. queue or topic) until the participant is able to successfully process it.
To get more thoughts how to implement it properly read how NServiceBus sagas framework is implemented. It's .NET framework for sagas but concepts are language agnostic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With