I am trying to get a stream of updates for certain tables from my PostgreSQL database. The regular way of getting all updates looks like this:
You create a logical replication slot
pg_create_logical_replication_slot('my_slot', 'wal2json');
And either connect to it using pg_recvlogical
or making special SQL queries. This allows you to get all the actions from the database in json (if you used wal2json plugin or similar) and then do whatever you want with that data.
But in PostgreSQL 10 we have Publication/Subscription mechanism which allows us to replicate selected tables only. This is very handy because a lot of useless data is not being sent. The process looks like this:
First, you create a publication
CREATE PUBLICATION foo FOR TABLE herp, derp;
Then you subscribe to that publication from another database
CREATE SUBSCRIPTION mysub CONNECTION <connection stuff> PUBLICATION foo;
This creates a replication slot on a master database under the hood and starts listening to updates and commit them to the same tables on a second database. This is fine if your job was to replicate some tables, but want to get a raw stream for my stuff.
As I mentioned, the CREATE SUBSCRIPTION
query is creating a replication slot on the master database under the hood, but how can I create one manually without the subscription and a second database? Here the docs say:
To make this work, create the replication slot separately (using the function pg_create_logical_replication_slot with the plugin name pgoutput)
According to the docs, this is possible, but pg_create_logical_replication_slot
only creates a regular replication slot. Is the pgoutput
plugin responsible for all the magic? If yes, then it becomes impossible to use other plugins like wal2json
with publications.
What am I missing here?
A replication slot has an identifier that is unique across all databases in a PostgreSQL cluster. Slots persist independently of the connection using them and are crash-safe. A logical slot will emit each change just once in normal operation.
To set up logical replication with pglogical, logical decoding must be enabled on the primary instance. Set cloudsql. logical_decoding=on on the Cloud SQL instance, or wal_level=logical on an external instance. Additionally, pglogical must be enabled on both the primary and replica instance; set cloudsql.
The slot to be dropped must be inactive before it can be removed. It is not possible to forcibly drop an active replication slot.
For replication to be achieved in PostgreSQL there must be two servers that can communicate with each other. It will identify this server as master, which is the master server or the production server and the other one is the Slave server or Replica server or standby server that will have a copy of master server data.
I have limited experience with logical replication and logical decoding in Postgres, so please correct me if below is wrong. That being said, here is what I have found:
pgoutput
plugin. You use it via plugin-specific options. It may be that other plugins have possibility to add the support, but I do not know whether the logical decoding plugin interface exposes sufficient details. I tested it against wal2json
plugin version 9e962ba
and it doesn't recognize this option.Here is an example how to use it in Postgres 10+:
-- Create publication
CREATE PUBLICATION cdc;
-- Create slot
SELECT pg_create_logical_replication_slot('test_slot_v1', 'pgoutput');
-- Create example table
CREATE TABLE replication_test_v1
(
id integer NOT NULL PRIMARY KEY,
name text
);
-- Add table to publication
ALTER PUBLICATION cdc ADD TABLE replication_test_v1;
-- Insert example data
INSERT INTO replication_test_v1(id, name) VALUES
(1, 'Number 1')
;
-- Peak changes (does not consume changes)
SELECT pg_logical_slot_peek_binary_changes('test_slot_v1', NULL, NULL, 'publication_names', 'cdc', 'proto_version', '1');
-- Get changes (consumes changes)
SELECT pg_logical_slot_get_binary_changes('test_slot_v1', NULL, NULL, 'publication_names', 'cdc', 'proto_version', '1');
To stream changes out of Postgres to other systems, you can consider using Debezium project. It is an open source distributed platform for change data capture, which among others provides a PostgreSQL connector. In version 0.10 they added support for pgoutput
plugin. Even if your use case is very different from what the project offers, you can look at their code to see how they interact with replication API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With