Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trigger alternatives for two tables that have to mutually update each other

(Sorry for the long post, but I guess all the information is really necessary)

We have two tables - task and subtask. Each task consists of one or more subtasks, and each of these objects has a start date, end date and duration. Additionally, subtasks have a ordering.

Tables

create table task (
  pk number not null primary key, 
  name varchar2(30) not null,
  start_date date,
  duration_in_days number,
  end_date date,
  needs_recomputation number default 0
);

create table subtask (
  pk number not null primary key, 
  task_fk references task(pk),
  name varchar2(30) not null,
  start_date date,
  duration_in_days number,
  end_date date,
  ordering number not null
);

Business rules

  • the first subtask has the same start date as the task
  • for each subsequent subtask, its start date equals the end date of the predecessor
  • the last subtask has the same end date as the task
  • for each subtask and task: start_date + duration = end_date
  • for the task: duration = sum(duration of subtasks)
  • the end date and duration for the task cannot be changed directly (thank God!)

This directly generates the following requirements for updates/deletes:

  • when the task's start date is changed, the start date of its first subtask is set to the same value, and the start date and end date for all subtasks is re-computed
  • when the start date, end date or duration of a subtask is changed, it's other fields are updated accordingly, all subsequent subtasks are updated accordingly, and finally, the task is updated accordingly
  • when a subtask is deleted, all subsequent subtasks are updated accordingly, and finally, the task is updated accordingly

Current approach

  • the task table has a trigger that updates the first subtask and set the needs_recomputation flag when the start date is changed
  • the subtask table has a trigger that keeps start date/end date/duration consistent and sets the needs_recomputation flag for the parent task (we cannot directly update the subsequent tasks here because of the mutating table problem)
  • to avoid a trigger cascade, each trigger sets a package variable to indicate that no other triggers should be firing
  • a dbms_scheduler job periodically checks the task table and re-computes the data for tasks whose needs_recomputation flag is set

This (kind of) works, but it has several drawbacks:

  • we might get inconsistent data if several people simultaneously change data for the same task (see AskTom on problems with triggers)
  • after an update on the subtask table, we have a short time period where the data is inconsistent (until the next time the sync job runs). Currently, we manually run the job after each change action in the GUI, but this is obviously error-prone

So my question is - is there any sensible alternative approach for this?

Package

create or replace package pkg_task is

  g_update_in_progress boolean;
  procedure recomputeDates(p_TaskID in task.pk%TYPE);

  procedure recomputeAllDates;
end;

create or replace package body pkg_task is

  procedure recomputeDates(p_TaskID in task.pk%TYPE) is
  begin
    g_update_in_progress := true;
    -- update the subtasks
    merge into subtask tgt
    using (select pk,
                  start_date,
                  duration_in_days,
                  end_date,
                  sum(duration_in_days) over(partition by task_fk order by ordering) as cumulative_duration,
                  min(start_date) over(partition by task_fk) + sum(duration_in_days) over(partition by task_fk order by ordering rows between unbounded preceding and 1 preceding) as new_start_date,
                  min(start_date) over(partition by task_fk) + sum(duration_in_days) over(partition by task_fk order by ordering) as new_end_date
             from subtask s
            where s.task_fk = p_TaskID
            order by task_fk,
                     ordering) src
    on (src.pk = tgt.pk)
    when matched then
      update
         set tgt.start_date = nvl(src.new_start_date,
                                  src.start_date),
             tgt.end_date   = nvl(src.new_end_date,
                                  src.end_date);
    -- update the task                                  
    merge into task tgt
    using (select p_TaskID as pk,
                  min(s.start_date) as new_start_date,
                  max(s.end_date) as new_end_date,
                  sum(s.duration_in_days) as new_duration
             from subtask s
            where s.task_fk = p_TaskID) src
    on (tgt.pk = src.pk)
    when matched then
      update
         set tgt.start_date          = src.new_start_date,
             tgt.end_date            = src.new_end_date,
             tgt.duration_in_days    = src.new_duration,
             tgt.needs_recomputation = 0;
    g_update_in_progress := false;
  end;

  procedure recomputeAllDates is
  begin
    for cur in (select pk
                  from task t
                 where t.needs_recomputation = 1)
    loop
      recomputeDates(cur.pk);
    end loop;
  end;

begin
  g_update_in_progress := false;
end;

Triggers

create or replace trigger trg_task
before update on task
for each row
  begin
    if (:new.start_date <> :old.start_date and not pkg_task.g_update_in_progress) then
      pkg_task.g_update_in_progress := true;
      -- set the start date for the first subtask
      update subtask s 
      set s.start_date = :new.start_date
      where s.task_fk = :new.pk
      and s.ordering = 1;
      :new.needs_recomputation := 1;
      pkg_task.g_update_in_progress := false;      
    end if;
  end;

create or replace trigger trg_subtask
  before update on subtask
  for each row
declare
  l_date_changed boolean := false;
begin
  if (not pkg_task.g_update_in_progress) then
    pkg_task.g_update_in_progress := true;

    if (:new.start_date <> :old.start_date) then
      :new.end_date  := :new.start_date + :new.duration_in_days;
      l_date_changed := true;
    end if;
    if (:new.end_date <> :old.end_date) then
      :new.duration_in_days := :new.end_date - :new.start_date;
      l_date_changed        := true;
    end if;
    if (:new.duration_in_days <> :old.duration_in_days) then
      :new.end_date  := :new.start_date + :new.duration_in_days;
      l_date_changed := true;
    end if;

    if l_date_changed then
      -- set the needs_recomputation flag for the parent task
      -- if this is the first subtask, set the parent's start date, as well
      update task t
         set t.start_date         =
             (case
               when :new.ordering = 1 then
                :new.start_date
               else
                t.start_date
             end),
             t.needs_recomputation = 1
       where t.pk = :new.task_fk;
    end if;
    pkg_task.g_update_in_progress := false;
  end if;
end;

Job

begin
  dbms_scheduler.create_job(
      job_name => 'JOB_SYNC_TASKS'
     ,job_type => 'PLSQL_BLOCK'
     ,job_action => 'begin pkg_task.recomputeAllDates; commit; end; '

     ,start_date      => to_timestamp_tz('2014-01-14 10:00:00 Europe/Berlin',
                                         'yyyy-mm-dd hh24:mi:ss tzr')
     ,repeat_interval => 'FREQ=HOURLY;BYMINUTE=0,5,10,15,20,25,30,35,40,45,50,55'
     ,enabled => TRUE
     ,comments => 'Task sync job, runs every 5 minutes');
end;
like image 795
Frank Schmitt Avatar asked Jan 14 '14 10:01

Frank Schmitt


1 Answers

Using triggers here is just asking for trouble.

Furthermore, the choice of using the scheduler is probably not the best idea since the scheduled jobs can only see committed data. So either you're committing in the trigger which throws transaction logic out of the window or the changes to the tables are delayed until the end of the transaction.

You should either:

  1. Use procedures. The simplest answer. When you have multiple applications they should not perform DML/businees logic directly, they should always do it with procedures so that they all run the same code. Forbid direct DML with grants or views. You may need to force the use of procedures through INSTEAD OF triggers on views (consider this only if you can't modify the application).

  2. Probably even better than procedures in your case: use a schema that doesn't contain duplicate data. You don't want to store redundant data: this makes application development more complex than needed. In terms of performace, resources and energy, the best way to solve a problem is when you realize that the task is unnecessary.

    From the description of your model, here are the columns that you could remove:

    • task.duration_in_days
    • task.end_date
    • task.needs_recomputation
    • subtask.start_date
    • subtask.end_date


    The task table would contain the start date only, and each subtask would only store its duration. When you need the aggregate information, use joins. You can use views to let the applications access the data transparently.

  3. Use a mutating trigger workaround that uses package variables to identify modified rows with BEFORE and AFTER statement triggers. Obviously this will involve lots of code that will be hard to code, test and maintain so you should use options (1) and (2) whenever possible instead.

like image 111
Vincent Malgrat Avatar answered Sep 30 '22 14:09

Vincent Malgrat