Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Process PostgreSQL Triggers in a Distributed Environment

We're in the process of implementing PostgreSQL Triggers to monitor for inserts/updates/deletes on several tables so that another that app that is listening for these events can keep our relational database in sync with our full-text search database.

Here's what the trigger function looks like:

CREATE FUNCTION notification() RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify('search', TG_TABLE_NAME || ',id,' || NEW.id);
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

And here's how we're adding the trigger to each table:

CREATE TRIGGER foo_trigger AFTER INSERT OR UPDATE or DELETE ON foo
FOR EACH ROW EXECUTE PROCEDURE notification();

And here is a very basic example of how we would have a node app (worker) listening for these trigger events:

var pg  = require('pg');

var connString = "postgres://user@localhost/foo_local";

pg.connect(connString, function(err, client, done) {

  client.on('notification', function(msg) {
    //get the added / updated / deleted record
    //sync it with the search database
  });

  var query = client.query('LISTEN search');
});

Here's my three part question:

Part 1 Our app is load balanced across several instances. What happens when the node / worker app, which is also distributed, receives an event? Will all instances of the worker app that are listening receive the triggered event?

If so, that's bad - we don't want all instances of the worker app to process every event because they'd all be doing the same work and that would negate the benefits of having multiple listeners to distribute the load. How do we mitigate this?

Part 2 What happens if the worker receives a trigger event, but it is long running? Will PostgreSQL queue the events that have been triggered until the listeners receive them?

Part 3 We've got about 5 tables that we want to fire triggers on INSERT / UPDATE / DELETE. We've got a lot of requests, so this would fire a lot of events in a short period of time. We need a worker to listen to these events and process the changed records so that it can send them along to the full-text search database. Is there a better way to architect this to handle the volume?

The other solution our team is considering is abandoning SQL Triggers and just using a message queuing system to shove messages in a data store (SQS or Redis) and then just have workers pick off messages from the queue. We want to avoid this route if we can as it adds more architecture to our platform; however, we're prepared to do it if it's our only option.

Your thoughts would be much appreciated.

like image 280
doremi Avatar asked May 05 '15 23:05

doremi


People also ask

How do I run a trigger in PostgreSQL?

Syntax. CREATE TRIGGER trigger_name [BEFORE|AFTER|INSTEAD OF] event_name ON table_name [ -- Trigger logic goes here.... ]; Here, event_name could be INSERT, DELETE, UPDATE, and TRUNCATE database operation on the mentioned table table_name. You can optionally specify FOR EACH ROW after table name.

Are Postgres triggers transactional?

All PostgreSQL triggers execute in the same transaction as the transaction that has triggered them. Edit: You can also use LISTEN + NOTIFY to send a message from your trigger to a code that executes outside of the transaction. In that case, the message will only be delivered at the point of a successful commit.

What is instead of trigger in PostgreSQL?

INSTEAD OF triggers do not support WHEN conditions. Typically, row-level BEFORE triggers are used for checking or modifying the data that will be inserted or updated. For example, a BEFORE trigger might be used to insert the current time into a timestamp column, or to check that two elements of the row are consistent.


1 Answers

First of all, in your trigger function, you might want to make life easier for your listeners, by providing more specific details of exactly what changed (e.g. in an UPDATE).

You could do something like this:

CREATE OR REPLACE FUNCTION notification() RETURNS trigger AS $$
DECLARE
  id bigint;
BEGIN
  IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
    id = NEW.id;
  ELSE
    id = OLD.id;
  END IF;

  IF TG_OP = 'UPDATE' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'changes', hstore_to_json(hstore(NEW) - hstore(OLD)))::text);
    RETURN NEW;
  END IF;

  IF TG_OP = 'INSERT' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'row', row_to_json(NEW))::text);
    RETURN NEW;
  END IF;

  IF TG_OP = 'DELETE' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'row', row_to_json(OLD))::text);
    RETURN OLD;
  END IF;

END;
$$ LANGUAGE plpgsql;

Now for your questions, or at least: Part 1: I believe all the instances of the worker apps that are listening will receive the triggered event. This can be useful for pub/sub style real-time notification to multiple listeners. For your use case, it sounds like you would need to add some kind of queue package on top of the basic PostgreSQL LISTEN/NOTIFY, such as queue_classic (for Ruby) or perhaps pg-jobs for node.js.

Anyway, since it's several months since you asked this, I'm wondering what path you took in the end and how it worked out? Can you share your experience and insights?

like image 129
Yoni Rabinovitch Avatar answered Oct 20 '22 01:10

Yoni Rabinovitch