Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I fire a trigger at the end of a chain of updates?

I have a couple of tables that interact with each other using triggers, and the current way I've been handling the trigger execution uses pg_trigger_depth() < 2 which is ugly. I really want the final trigger to run only once and at the end after all the per row stuff has happened. Unfortunately, CONSTRAINT TRIGGERs are FOR EACH ROW only, and FOR STATEMENT triggers actually fire once per statement within the triggers, not just once per initial statement that started it.

I've looked through several other SO questions around the topic, and haven't found something similar enough to what I'm doing.

Here is the setup:

CREATE TABLE report(
  report_tk SERIAL PRIMARY KEY,
  report_id UUID NOT NULL,
  report_name TEXT NOT NULL,
  report_data INT NOT NULL,
  report_subscribers TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[],
  valid_range TSTZRANGE NOT NULL DEFAULT '(,)',
  EXCLUDE USING GIST ((report_id :: TEXT) WITH =, report_name WITH =, valid_range WITH &&)
);
CREATE TABLE report_subscriber(
  report_id INT NOT NULL REFERENCES report ON DELETE CASCADE;
  subscriber_name TEXT NOT NULL,
  needs_sync BOOLEAN NOT NULL DEFAULT TRUE,
  EXCLUDE USING GIST (subscriber_name WITH =, valid_range WITH &&)
);
CREATE OR REPLACE FUNCTION sync_subscribers_to_report()
  RETURNS TRIGGER LANGUAGE plpgsql SET SEARCH_PATH TO dwh, public AS $$
BEGIN
  RAISE INFO 'Running sync to report trigger';

  BEGIN
    CREATE TEMPORARY TABLE lock_sync_subscribers_to_report(
    ) ON COMMIT DROP;
    RAISE INFO 'syncing to report, stack depth is: %', pg_trigger_depth();
    UPDATE report r
    SET report_subscribers = x.subscribers
    FROM (
           SELECT
             report_tk
             , array_agg(DISTINCT u.subscriber_name ORDER BY u.subscriber_name) AS subscribers
           FROM report_subscriber s
           WHERE s.report_tk IN (
             SELECT DISTINCT report_tk
             FROM report_subscriber s2
             WHERE s.needs_sync
           )
           GROUP BY s.report_tk
         ) x
    WHERE r.report_tk = x.report_tk;
    RAISE INFO 'turning off sync flag, stack depth is: %', pg_trigger_depth();
    UPDATE report_subscriber
    SET needs_sync = FALSE
    WHERE needs_sync = TRUE;
    RETURN NULL;
  EXCEPTION WHEN DUPLICATE_TABLE THEN
    RAISE INFO 'skipping recursive call, stack depth is: %', pg_trigger_depth();
    RETURN NULL;
  END;
END;
$$;
CREATE TRIGGER sync_subscribers_to_report
  AFTER INSERT OR UPDATE OR DELETE
  ON report_subscriber
  FOR STATEMENT
EXECUTE PROCEDURE sync_subscribers_to_report();

So with this setup, I'd like to be able to:

  • insert a report record
  • guarantee that a report name can only exist once at any single point in time (the EXCLUDE on valid_range)
  • insert a report subscriber in the subscribers table
  • guarantee that a subscriber can not subscribe to more than one report at a time.
  • allow more than one person to subscribe to a report.
  • whenever a record is added to the subscriber table, add the name to the list of subscribers in the report table.
  • whenever a record is deleted from the subscriber table, remove the name from the list of subscribers in the report table.
  • whenever a record is deleted from the report table, delete the corresponding subscriber records (taken care of by the ON DELETE CASCADE

If there are a lot of edits to the subscriber table in a single statement (the common case), it would be best to just run one simple query to update the report table using the aggregation of the new and remaining records from the subscriber table.

My original solution involved adding a needs_update flag to the subscriber table and triggering off of that to do the update then turn the flag off. Of course, this causes another firing of the trigger which I stopped with the pg_trigger_depth() < 2 (the 2 is because the inserts can be caused by some other trigger in the system). Besides the ugly, it is also annoying that the statements in the trigger functions cause yet more FOR EACH STATEMENT firings to occur.

I tried a different version of the flag using a trick I saw in one of the other SO answers ( https://stackoverflow.com/a/8950639/2340769 ) of creating a temp table and catching a dupe table exception to prevent further executions. I don't think it really improves the issue much though.

Is there a way to do what I'm trying to do in a clean manner? While this is an obvious toy example, my real application does need to build that "packed array" representation of the data, and it would be great to do so in an efficient manner.

like image 364
deinspanjer Avatar asked Dec 17 '17 06:12

deinspanjer


1 Answers

Rather than using a flag in report_subscriber itself, I think you'd be better off with a separate queue of pending changes. This has a few benefits:

  • No trigger recursion
  • Under the hood, UPDATE is just DELETE + re-INSERT, so inserting into a queue will actually be cheaper than flipping a flag
  • Possibly quite a bit cheaper, since you only need to queue the distinct report_ids, rather than cloning entire report_subscriber records, and you can do it in a temp table, so the storage is contiguous and nothing needs to be synched to disk
  • No race conditions to worry about when flipping the flags, as the queue is local to the current transaction (in your implementation, the records affected by the UPDATE report_subscriber are not necessarily the same records you picked up in the SELECT...)

So, initialise the queue table:

CREATE FUNCTION create_queue_table() RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
  CREATE TEMP TABLE pending_subscriber_changes(report_id INT UNIQUE) ON COMMIT DROP;
  RETURN NULL;
END
$$;

CREATE TRIGGER create_queue_table_if_not_exists
  BEFORE INSERT OR UPDATE OF report_id, subscriber_name OR DELETE
  ON report_subscriber
  FOR EACH STATEMENT
  WHEN (to_regclass('pending_subscriber_changes') IS NULL)
  EXECUTE PROCEDURE create_queue_table();

...queue up changes as they arrive, ignoring anything already queued:

CREATE FUNCTION queue_subscriber_change() RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
  IF TG_OP IN ('DELETE', 'UPDATE') THEN
    INSERT INTO pending_subscriber_changes (report_id) VALUES (old.report_id)
    ON CONFLICT DO NOTHING;
  END IF;

  IF TG_OP IN ('INSERT', 'UPDATE') THEN
    INSERT INTO pending_subscriber_changes (report_id) VALUES (new.report_id)
    ON CONFLICT DO NOTHING;
  END IF;
  RETURN NULL;
END
$$;

CREATE TRIGGER queue_subscriber_change
  AFTER INSERT OR UPDATE OF report_id, subscriber_name OR DELETE
  ON report_subscriber
  FOR EACH ROW
  EXECUTE PROCEDURE queue_subscriber_change();

...and process the queue at the end of the statement:

CREATE FUNCTION process_pending_changes() RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
  UPDATE report
  SET report_subscribers = ARRAY(
    SELECT DISTINCT subscriber_name
    FROM report_subscriber s
    WHERE s.report_id = report.report_id
    ORDER BY subscriber_name
  )
  FROM pending_subscriber_changes c
  WHERE report.report_id = c.report_id;

  DROP TABLE pending_subscriber_changes;
  RETURN NULL;
END
$$;

CREATE TRIGGER process_pending_changes
  AFTER INSERT OR UPDATE OF report_id, subscriber_name OR DELETE
  ON report_subscriber
  FOR EACH STATEMENT
  EXECUTE PROCEDURE process_pending_changes();

There's a slight problem with this: UPDATE doesn't offer any guarantees about the update order. This means that, if these two statements were run simultaneously:

INSERT INTO report_subscriber (report_id, subscriber_name) VALUES (1, 'a'), (2, 'b');
INSERT INTO report_subscriber (report_id, subscriber_name) VALUES (2, 'x'), (1, 'y');

...then there's a chance of a deadlock, if they attempt to update the report records in opposite orders. You can avoid this by enforcing a consistent ordering for all updates, but unfortunately there's no way to attach an ORDER BY to an UPDATE statement; I think you need to resort to cursors:

CREATE FUNCTION process_pending_changes() RETURNS TRIGGER LANGUAGE plpgsql AS $$
DECLARE
  target_report CURSOR FOR
    SELECT report_id
    FROM report
    WHERE report_id IN (TABLE pending_subscriber_changes)
    ORDER BY report_id
    FOR NO KEY UPDATE;
BEGIN
  FOR target_record IN target_report LOOP
    UPDATE report
    SET report_subscribers = ARRAY(
        SELECT DISTINCT subscriber_name
        FROM report_subscriber
        WHERE report_id = target_record.report_id
        ORDER BY subscriber_name
      )
    WHERE CURRENT OF target_report;
  END LOOP;

  DROP TABLE pending_subscriber_changes;
  RETURN NULL;
END
$$;

This still has the potential to deadlock if the client tries to run multiple statements within the same transaction (as the update ordering is only applied within each statement, but the update locks are held until commit). You can work around this (sort of) by firing off process_pending_changes() just once at the end of the transaction (the drawback is that, within that transaction, you won't see your own changes reflected in the report_subscribers array).

Here's a generic outline for an "on commit" trigger, if you think it's worth the trouble to fill it in:

CREATE FUNCTION run_on_commit() RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
  <your code goes here>
  RETURN NULL;
END
$$;

CREATE FUNCTION trigger_already_fired() RETURNS BOOLEAN LANGUAGE plpgsql VOLATILE AS $$
DECLARE
  already_fired BOOLEAN;
BEGIN
  already_fired := NULLIF(current_setting('my_vars.trigger_already_fired', TRUE), '');
  IF already_fired IS TRUE THEN
    RETURN TRUE;
  ELSE
    SET LOCAL my_vars.trigger_already_fired = TRUE;
    RETURN FALSE;
  END IF;
END
$$;

CREATE CONSTRAINT TRIGGER my_trigger
  AFTER INSERT OR UPDATE OR DELETE ON my_table
  DEFERRABLE INITIALLY DEFERRED
  FOR EACH ROW
  WHEN (NOT trigger_already_fired())
  EXECUTE PROCEDURE run_on_commit();
like image 123
Nick Barnes Avatar answered Sep 23 '22 00:09

Nick Barnes