Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Docker Swarm handle database (PostgreSQL) replication?

I'm learning Docker Swarm mode and I managed to create a Swarm locally with a web application and a PostgreSQL database. I can scale them and I see Swarm creating replicas.

I think I understand how Docker Swarm can load balance regular web servers, but how does it deal out of the box with database containers?

Outside of the Swarm context, usually databases have their own ways to deal with replication, in the form of plugins or extended products like MySQL cluster. Other databases like Cassandra have replication built directly into their product. On a Swarm context, do we still need to rely on those database plugins and features?

What is the expected pattern to handle data consistency between replicas of a database container?

I know it's a very open-ended question, but Docker's documentation is very open-ended too and I can't seem to find anything specific to this.

like image 878
pfernandom Avatar asked Nov 03 '16 17:11

pfernandom


People also ask

How does PostgreSQL replication work?

PostgreSQL saves the updated information of the primary server as a transaction log known as write-ahead log, or WAL, in preparation for crash recovery or rollback. Streaming replication works by transferring, or shipping, the WAL to the standby server in real time, and applying it on the standby server.

Where does Postgres store docker data?

User Defined Volume To circumvent this issue, we can use the information we gathered earlier that showed us that the volume is mounted at /var/lib/postgresql/data. Inside the container, this directory is where Postgres stores all the relevant tables and databases.

Is docker swarm being deprecated?

Docker Swarm is not being deprecated, and is still a viable method for Docker multi-host orchestration, but Docker Swarm Mode (which uses the Swarmkit libraries under the hood) is the recommended way to begin a new Docker project where orchestration over multiple hosts is required.


2 Answers

How does it deal out of the box with database containers?

It doesn't.

There is a pretty good description of Swarm services here: How services work (emphasis mine)

When you deploy the service to the swarm, the swarm manager accepts your service definition as the desired state for the service. Then it schedules the service on nodes in the swarm as one or more replica tasks.

Swarm has no idea what's inside the task, all it knows is how many instances of it there are, whether those instances are passing their health checks, and if there are enough of them to satisfy the task definition you gave it. The word overlap between this and database replicas is a little unfortunate, but they are different concepts.

What is the expected pattern to handle data consistency between replicas of a database container?

Setting up data replication is on you. These are probably as good a place to start as any

  • How to Set Up PostgreSQL for High Availability and Replication with Hot Standby
  • PostgreSQL Replication Example
like image 91
Roman Avatar answered Oct 12 '22 15:10

Roman


Docker swarm currently scales well for the stateless applications. For database replication, you have to rely on every database's own replication mechanism. Swarm could not manage the datatbase replication. The volume or file system level replication could provide the protection for a single instance database, but are not aware of database replication/cluster.

For databases such as PostgreSQL, the additional works are required. There are a few options:

  1. Use host's local directory. You will need to create one service for every replica, and use constraint to schedule the container to one specific host. You will also need custom postgresql docker image to set up the postgresql replication among replicas. While, when one node goes down, one PostgreSQL replica will go down. You will need to work to bring up another replica. See crunchydata's example.

  2. Use the volume plugin, such as flocker, REX-Ray. You will still need to create one service for every replica, and bind one volume to one service. You need to create all services in the same overlay network and configure the PostgreSQL replicas to talk with each other via the dns name (the docker service name of the replica). You will still need to set up the postgresql replication among replicas.

like image 33
CloudStax Avatar answered Oct 12 '22 16:10

CloudStax