In the context of a microservice architecture, a message driven, asynchronous, event based design seems to be gaining popularity (see here and here for some examples, as well as the Reactive Manifesto - Message Driven trait) as opposed to a synchronous (possibly REST based) mechanism.
Taking that context and imagining an overly simplified ordering system, as depicted below:
and the following message flow:
CreateOrderEvent
CreateOrderEvent
, does some inventory stuff and publishes a InventoryUpdatedEvent
when it's doneInventoryUpdatedEvent
, sends an invoice and publishes a EmailInvoiceEvent
All services are up and we happily process orders... Everyone is happy. Then, the Inventory service goes down for some reason 😬
Assuming that the events on the event bus are flowing in a "non blocking" manor. I.e. the messages are being published to a central topic and do not pile up on a queue if no service is reading from it (what I'm trying to convey is an event bus where, if the event is published on the bus, it would flow "straight through" and not queue up - ignore what messaging platform/technology is used at this point). That would mean that if the Inventory service were down for 5 minutes, the CreateOrderEvent
's passing through the event bus during that time are now "gone" or not seen by the Inventory service because in our overly simplified system, no other system is interested in those events.
My question then is: How does the Inventory service (and the system as a whole) restore state in a way that no orders are missed/not processed?
The receiving system will processes the request but return an error result. In event-driven systems, the events that carry requests can simply drop a notification into a messaging queue and move on. That means a separate process will be responsible for detecting the error and handling it as appropriate.
Scalability is another operational challenge associated with microservices architecture. Although the scalability of microservices is often touted as an advantage, successfully scaling your microservice-based applications is challenging. Optimizing and scaling require more complex coordination.
To begin with, in an event-driven microservice architecture, services communicate each-other via event messages. When business events occur, producers publish them with messages. At the same time, other services consume them through event listeners.
To publish a basic event, at least two technologies are needed: Storage System and Message Queueing Protocol. Among all of them, the most important benefit is the first one. Because we want to separate the components by microservice architecture, all of the units must be separated enough (loosely-coupled).
This microservice receives an event, writing it to its own topics with both the event to retry and the timestamp to retry that event. It then pushes out these retry events once their timestamp has been reached.
The retry microservice’s job is to track and action all retries. This microservice receives an event, writing it to its own topics with both the event to retry and the timestamp to retry that event. It then pushes out these retry events once their timestamp has been reached.
Good question! So there are basically three forces at play here:
For both #1 and #2 you want some sort of persistent log of events. A traditional message queue/topic may provide this though you have to consider the cases when messages may be processed out of order wrt to transactions/exception/fault behaviors. A more simple log like Apache Bookkeeper, Apache Kafka, AWS Kinesis etc can store/persist these types of events in sequence and leave it to the consumers to process in order/filter out duplicates/partition streams etc.
number 3 to me is a state machine. however you implement the state machine is really up to you. Basically this state machine keeps track of what events have happened and transitions to allowed states (and potentially participates in emitting events/commands) based on the events in the other systems.
For example, a real-world use case might look like an "escrow" when you're trying to close on a house. The escrow company not just handles the financial transaction, but usually they work with the real-estate agent to coordinate getting papers in order, papers signed, money transferred, etc. After each event, the escrow changes state from "waiting for buyer signature" to "waiting for seller signature" to "waiting for funds" to "closed success" ... they even have deadlines for these events to happen, etc and can transition to another state if money doesn't get transferred like "transaction closed, not finished" or something.
This state machine in your example would listen on the pub/sub channels and captures this state, runs timers, emits other events to further the systems involved, etc. It doesn't necessarily "orchestrate" them per se, but it does track the progress and enforce timeouts and compensations where needed. This could be implemented as a stream processor, as a process engine, or (imho best place to start) just a simple "escrow" service.
There's actually more to keep track of like what happens if a "escrow" service goes down/fails, how does it handle duplicates, how does it handle unexpected events given it state, how does it contribute to duplicate events, etc... but hopefully enough to get started.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With