Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

manually create replication slot for publication in PostgreSQL 10

I am trying to get a stream of updates for certain tables from my PostgreSQL database. The regular way of getting all updates looks like this:

You create a logical replication slot

pg_create_logical_replication_slot('my_slot', 'wal2json');

And either connect to it using pg_recvlogical or making special SQL queries. This allows you to get all the actions from the database in json (if you used wal2json plugin or similar) and then do whatever you want with that data.

But in PostgreSQL 10 we have Publication/Subscription mechanism which allows us to replicate selected tables only. This is very handy because a lot of useless data is not being sent. The process looks like this:

First, you create a publication

CREATE PUBLICATION foo FOR TABLE herp, derp;

Then you subscribe to that publication from another database

CREATE SUBSCRIPTION mysub CONNECTION <connection stuff> PUBLICATION foo;

This creates a replication slot on a master database under the hood and starts listening to updates and commit them to the same tables on a second database. This is fine if your job was to replicate some tables, but want to get a raw stream for my stuff.

As I mentioned, the CREATE SUBSCRIPTION query is creating a replication slot on the master database under the hood, but how can I create one manually without the subscription and a second database? Here the docs say:

To make this work, create the replication slot separately (using the function pg_create_logical_replication_slot with the plugin name pgoutput)

According to the docs, this is possible, but pg_create_logical_replication_slot only creates a regular replication slot. Is the pgoutput plugin responsible for all the magic? If yes, then it becomes impossible to use other plugins like wal2json with publications.

What am I missing here?

like image 769
George Avatar asked Mar 16 '18 14:03

George


People also ask

What is logical replication slot in Postgres?

A replication slot has an identifier that is unique across all databases in a PostgreSQL cluster. Slots persist independently of the connection using them and are crash-safe. A logical slot will emit each change just once in normal operation.

How do you set up logical replication?

To set up logical replication with pglogical, logical decoding must be enabled on the primary instance. Set cloudsql. logical_decoding=on on the Cloud SQL instance, or wal_level=logical on an external instance. Additionally, pglogical must be enabled on both the primary and replica instance; set cloudsql.

How do I drop a replication slot in PostgreSQL 12?

The slot to be dropped must be inactive before it can be removed. It is not possible to forcibly drop an active replication slot.

How do I replicate a Postgres database?

For replication to be achieved in PostgreSQL there must be two servers that can communicate with each other. It will identify this server as master, which is the master server or the production server and the other one is the Slave server or Replica server or standby server that will have a copy of master server data.


1 Answers

I have limited experience with logical replication and logical decoding in Postgres, so please correct me if below is wrong. That being said, here is what I have found:

  1. Publication support is provided by pgoutput plugin. You use it via plugin-specific options. It may be that other plugins have possibility to add the support, but I do not know whether the logical decoding plugin interface exposes sufficient details. I tested it against wal2json plugin version 9e962ba and it doesn't recognize this option.
  2. Replication slots are created independently from publications. Publications to be used as a filter are specified when fetching changes stream. It is possible to peek changes for one publication, then peek changes for another publication and observe different set of changes despite using the same replication slot (I did not find it documented and I was testing on Aurora with Postgres compatibility, so behavior could potentially vary).
  3. Plugin output seems to include all entries for begin and commit, even if transaction did not touch any of tables included in publication of interest. It does not however include changes to other tables than included in the publication.

Here is an example how to use it in Postgres 10+:

-- Create publication
CREATE PUBLICATION cdc;

-- Create slot
SELECT pg_create_logical_replication_slot('test_slot_v1', 'pgoutput');

-- Create example table
CREATE TABLE replication_test_v1
(
  id integer NOT NULL PRIMARY KEY,
  name text
);

-- Add table to publication
ALTER PUBLICATION cdc ADD TABLE replication_test_v1;

-- Insert example data
INSERT INTO replication_test_v1(id, name) VALUES
  (1, 'Number 1')
;

-- Peak changes (does not consume changes)
SELECT pg_logical_slot_peek_binary_changes('test_slot_v1', NULL, NULL, 'publication_names', 'cdc', 'proto_version', '1');

-- Get changes (consumes changes)
SELECT pg_logical_slot_get_binary_changes('test_slot_v1', NULL, NULL, 'publication_names', 'cdc', 'proto_version', '1');

To stream changes out of Postgres to other systems, you can consider using Debezium project. It is an open source distributed platform for change data capture, which among others provides a PostgreSQL connector. In version 0.10 they added support for pgoutput plugin. Even if your use case is very different from what the project offers, you can look at their code to see how they interact with replication API.

like image 146
krzychu Avatar answered Sep 29 '22 22:09

krzychu