Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter and sort data from multiple microservices?

We have microservices which work with different, but related data. For example, ads and their stats. We want to be able to filter, sort and aggregate this related data for UI(and not only for it). For example, we want to show to a user ads which have 'car' in their text and which have more than 100 clicks.

Challenges:

  • There could be a lot of data. Some users have millions of rows after filtration
  • Services doesn't have all the data. For example, for statistics service ad without stats == non existent ad. It doesn't know anything about such ads. But sorting and filtration should work anyway(ad without stats should be considered as ad without zero clicks)

Requirements:

  • Eventual consistency within couple of seconds is OK
  • Data loss is not acceptable
  • 5 to 10 seconds filtration and sorting for big clients with millions of rows is OK

Solutions we could think of:

  • Load all data required by query from all services and filter and sort it in memory.
  • Push updates from services to Elasticsearch(or something like this). Elastic handles query and returns ids of desired entities which then loaded from services.
  • One big database for all services which has everything

What should we pay attention to? Are there other ways to solve our problem?

like image 249
Artem Malinko Avatar asked Jan 26 '18 09:01

Artem Malinko


People also ask

How do I manage multiple databases in microservices?

Create a single database for different microservices is anti-pattern, then the correct way is to create a database for each microservice.

How do you implement a query that retrieves data from multiple services in a microservice architecture?

7.1. One way to implement query operations, such as findOrder() , that retrieve data owned by multiple services is to use the API composition pattern. This pattern implements a query operation by simply invoking the services that own the data and combining the results. Figure 7.2 shows the structure of this pattern.

Can 2 microservices connect to same database?

Yes, it's possible to integrate a database for microservices. You can create a single shared database with each service accessing data using local ACID transactions.

What is the best way to perform transaction management when multiple microservices are involved?

Microservices guidelines strongly recommend you to use the Single Repository Principle(SRP), which means each microservice maintains its own database and no other service should access the other service's database directly. There is no direct and simple way of maintaining ACID principles across multiple databases.


1 Answers

You could use CQRS. In this low level architecture, the model use for writing data is split from the model use to read/query data. The write model is the canonical source of information, is the source of truth.

The write model publishes events that are interpreted/projected by one or more read models, in an eventually consistent manner. Those events could be even published in a message queue and consumed by external read models (other microservices). There is no 1:1 mapping from write to read. You can have 1 model for write and 3 models for read. Each read model is optimized for its use-case. This is the part that interests you: an speed-optimized read model.

An optimized read model has every thing it needs when it answers the queries. The data is fully denormalized (this means it needs no joins) and already indexed.

A read model can have its data sharded. You do this in order to minimize the collection size (a small collection is faster than a bigger one). In your case, you could shard by user: each user would have its own collection of statistics (i.e. a table in SQL or a document collection in NoSQL). You can use the build-in sharding of the database or you could shard it manually, by splitting in separate collections (tables).

Services doesn't have all the data.

A read model could subscribe to many sources of truth (i.e. microservices or event streams).

One particular case that works very well with CQRS is Event sourcing; it has the advantage that you have the events from the begging of time, without the need to store them in a persistent message queue.

P.S. I could not think about a use-case when a read model could not be made fast enough, given enough hardware resources.

like image 173
Constantin Galbenu Avatar answered Oct 24 '22 05:10

Constantin Galbenu