Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seeding microservices databases

Given service A (CMS) that controls a model (Product, let's assume the only fields that it has are id, title, price) and services B (Shipping) and C (Emails) that have to display given model what should the approach be to synchronize given model information across those services in event sourcing approach? Let's assume that product catalog rarely changes (but does change) and that there are admins that can access data of shipments and emails very often (example functionalities are: B:display titles of products the order contained and C:display content of email about shipping that is going to be sent). Each of the services has their own DB.

Solution 1

Send all required information about Product within event - this means following structure for order_placed:

{
    order_id: [guid],
    product: {
        id: [guid],
        title: 'Foo',
        price: 1000
    }
}

On service B and C product information is stored in product JSON attribute on orders table

As such, to display necessary information only data retrieved from the event is used

Problems: depending upon what other information needs to be presented in B and C, amount of data in event can grow. B and C might not require the same information about Product, but the event will have to contain both (unless we separate the events into two). If given data is not present within given event, code can not use it - if we'll add a color option to given Product, for existing orders in B and C, given product will be colorless unless we update the events and then rerun them.

Solution 2

Send only guid of product within event - this means following structure for order_placed:

{
    order_id: [guid],
    product_id: [guid]
}

On services B and C product information is stored in product_id attribute on orders table

Product information is retrieved by services B and C when required by performing an API call to A/product/[guid] endpoint

Problems: this makes B and C dependant upon A (at all times). If schema of Product changes on A, changes have to be done on all services that depend on them (suddenly)

Solution 3

Send only guid of product within event - this means following structure for order_placed:

{
    order_id: [guid],
    product_id: [guid]
}

On services B and C product information is stored in products table; there's still product_id on orders table, but there's replication of products data between A, B and C; B and C might contain different information about Product than A

Product information is seeded when services B and C are created and are updated whenever information about Products changes by making call to A/product endpoint (that displays required information of all products) or by performing a direct DB access to A and copying necessary product information required for given service.

Problems: this makes B and C dependant upon A (when seeding). If schema of Product changes on A, changes have to be done on all services that depend on them (when seeding)


From my understanding, the correct approach would be to go with solution 1, and either update events history per certain logic (if Product catalog hasn't changed and we want to add color to be displayed, we can safely update history to get current state of Products and fill missing data within the events) or cater for nonexistence of given data (if Product catalog has changed and we want to add color to be displayed, we can't be sure if at that point in time in the past given Product had a color or not - we can assume that all Products in previous catalog were black and cater for by updating events or code)

like image 728
eithed Avatar asked Feb 28 '20 12:02

eithed


People also ask

What is data seeding in database?

Data seeding is the process of populating a database with an initial set of data. There are several ways this can be accomplished in EF Core: Model seed data. Manual migration customization. Custom initialization logic.

Can 2 microservices have same database?

In the shared-database-per-service pattern, the same database is shared by several microservices. You need to carefully assess the application architecture before adopting this pattern, and make sure that you avoid hot tables (single tables that are shared among multiple microservices).

Should each microservice have its own DB?

An important rule for microservices architecture is that each microservice must own its domain data and logic. Just as a full application owns its logic and data, so must each microservice own its logic and data under an autonomous lifecycle, with independent deployment per microservice.

Should two microservices share a database?

Microservices with shared databases can't easily scale. What is more, the database will be a single point of failure. Changes related to the database could impact multiple services. Besides, microservices won't be independent in terms of development and deployment as they connect to and operate on the same database.


1 Answers

Solution #3 is really close to the right idea.

A way to think about this: B and C are each caching "local" copies of the data that they need. Messages processed at B (and likewise at C) use the locally cached information. Likewise, reports are produced using the locally cached information.

The data is replicated from the source to the caches via a stable API. B and C don't even need to be using the same API - they use whatever fetch protocol is appropriate for their needs. In effect, we define a contract -- protocol and message schema -- which constrain the provider and the consumer. Then any consumer for that contract can be connected to any supplier. Backward incompatible changes require a new contract.

Services choose the appropriate cache invalidation strategy for their needs. This might mean pulling changes from the source on a regular schedule, or in response to a notification that things may have changed, or even "on demand" -- acting as a read through cache, falling back to the stored copy of the data when the source is not available.

This gives you "autonomy", in the sense that B and C can continue to deliver business value when A is temporarily unavailable.

Recommended reading: Data on the Outside, Data on the Inside, Pat Helland 2005.

like image 96
VoiceOfUnreason Avatar answered Oct 11 '22 01:10

VoiceOfUnreason