I'm learning about microservice data replication right now, and one thing I'm having trouble with is coming up with the right architecture for ensuring event atomicity. The way I understand it, the basic flow is:
But what if, for example, a power outage occurred in-between Steps 1 and 2? In a naively-built system, that would mean the changes persist but the event detailing them will never be published. I've pondered the following ideas to create better guarantees, but I'm not quite sure of all the pros and cons of each:
A: Use an embedded database (like SQLite) in my microservice instance to track the full transaction, from the commit to the main database to the event publishing.
B: Create an Events table in my main database, using database transactions to insert the Event and commit the relevant changes at the same time. The service would then push the Event to the bus, and then make another commit to the main database to mark the Event as Published.
C: As above, create an Events table in my main database, using database transactions to insert the Event and commit the relevant changes at the same time. Then, notify (either manually via REST/Messages from within the service or via database hooks) a dedicated EventPusher service that a new event has been appended. The EventPusher service will query the Events table and push the events to the bus, marking each one as Published upon acknowledgement. Should a certain amount of time pass without any notification, the EventPusher will do a manual query.
What are the pros and cons of each of the choices above? Is there another superior option I have yet to consider?
As described earlier, when you use event-based communication, a microservice publishes an event when something notable happens, such as when it updates a business entity. Other microservices subscribe to those events.
It means that we can use different database technologies for different microservices. So one service may use an SQL database and another one a NoSQL database. That's feature allows using the most efficient database depending on the service requirements and functionality.
Modern microservices designs are reactive and event driven. As a result, they are loosely connected and simple to update and maintain.
Event sourcing persists the state of a business entity such an Order or a Customer as a sequence of state-changing events. Whenever the state of a business entity changes, a new event is appended to the list of events. Since saving an event is a single operation, it is inherently atomic.
I have been wondering the same thing.
Apparently, there are a number of ways to deal with atomicity of updating the db and publishing the corresponding event.
- Event sourcing
- Application events
- Database triggers
- Transaction log tailing
(Pattern: Event-driven architecture)
The Application events pattern sounds similar to your ideas.
An example could be:
The Order Service inserts a row into the ORDER table and inserts an Order Created event into the EVENT table [in the scope of a single local db transaction].
The Event Publisher thread or process queries the EVENT table for unpublished events, publishes the events, and then updates the EVENT table to mark the events as published.
(Event-Driven Data Management for Microservices)
If at any point the Event Publisher crashes or otherwise fails, the events it did not process are still marked as unpublished.
So when the Event Publisher comes back online it will immediately publish those events.
If the Event Publisher published an event and then crashed before marking the event as published, the event might be published more than once.
For this reason, it is important that subscribers de-duplicate received messages.
Additionally, the answer to a stackoverflow question — that might sound quite different to yours but in essence is asking the same thing — links to a couple of relevant blog posts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With