Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a version/history system to database table [closed]

I am working on a project where I have to add a form of data versioning or history feature to every table. Basically we have to keep track of every insert or change in the database, so that it is easy to roll back to or view previous versions of the data in each table.

The way my project manager envisions this is to be done is to add a few new coloumns to each table. The main feature is a coloumn named "version". Every time an update is made, nothing is really updated, the old row remains but a new row is added to the table with an incremented value for "version".

To show the current data, we just use a view that shows only rows with the highest version number of each type.

While this works great as far as moving back and forth between different versions, I have run across a problem with this approach. For any relationship to exist between tables, we need to define foreign keys and foreign keys can only reference unique fields in the other table. Now that we are keeping several versions of the same row (with the same 'Id' as it is basically the same piece of data as far as our application is concerned) we can no longer use another table's 'Id' as a foreign key for a table.

We are using a unique primary key field for each row, but that is useless as an identifier since several rows are basically different versions of the same thing. We could manually keep track of the latest version of each kind of entry and update corresponding foreign key relationships every time something is changed, but that looks like a a lot of work and I am not sure that will always even work (e.g. reverting to previous version of an entry might cause the foreign key to reference an old and unusable version of another entry in some other table.)

I know there are other ways of keeping a history of database updates (e.g. by using a separate history table for each table), but I am stuck with this approach in this project. Is there some more obvious way of handling relationships between tables like this that I am missing?

Note: I am using MS SQL Server 2008 R2.

like image 954
MAK Avatar asked Feb 11 '11 11:02

MAK


1 Answers

You say that not want a "separate revision table", not voting to FractalizeR's solution because this. Ok, here is a "one table solution"... But, please, simplify/generalise your question, for better answers and better use of this page for all visitors: I think your problem is about "revision control" on SQL tables.

The solution for "ISO 2008 SQL", then I think it is also for Microsoft SQL-Server. I tested it on PostgreSQL 9.1.

In this kind of problem we can use a SQL View to "emulate" the original table, and the "versioned table" as a new one, with more attributes: * An new attribute moment for sort (ordering) the revisions and for time registering; * An new attribute cmd for "traceability" (not really necessary).

Suppose your original (and conventional) table is t. For revision control you must add new attributes, but other programmers not need to see this new attributes... The solution is to rename the table t to t_hist and offer to other programmers a SQL VIEW t (as a query over t_hist).

t is a VIEW for show a conventional table: only "current tuples". t_hist is the new table, with the "history tuples".

Suppose t with attributes a,b. PS: on t_hist I added isTop for better performance on t.

 -- ....
 CREATE TABLE  t_hist (
    -- the old attributes for t:
    id integer NOT NULL, -- a primary key of t
    a varchar(10),  -- any attribute
    b integer,      -- any attribute

    -- new attributes for revision control:
    isTop BOOLEAN NOT NULL DEFAULT true, -- "last version" or "top" indicator
    cmd varchar(60) DEFAULT 'INSERT',    -- for traceability
    moment timestamp NOT NULL DEFAULT now(), -- for sort revisions
    UNIQUE(id,moment)
);

CREATE VIEW t AS
  SELECT id,a,b FROM t_hist WHERE isTop;
   -- same, but better performance, as 
   -- SELECT id,a,b FROM t_hist GROUP BY id,a,b HAVING MAX(moment)=moment  

-- Verifies consistency in INSERT:
CREATE FUNCTION t_hist_uniq_trig() RETURNS TRIGGER AS $$
DECLARE
  aux BOOLEAN;
BEGIN
   SELECT true INTO aux FROM t_hist 
   WHERE id=NEW.id AND moment>=NEW.moment;
   IF found THEN -- want removes from top?
     RAISE EXCEPTION 'TRYING TO INCLUDE (ID=%) PREVIOUS TO %', NEW.id, NEW.moment;
    END IF;
    RETURN NEW;
END  $$ LANGUAGE plpgsql;
CREATE TRIGGER uniq_trigs BEFORE INSERT ON t_hist 
    FOR EACH ROW EXECUTE PROCEDURE t_hist_uniq_trig();   

CREATE FUNCTION t_reset_top(integer)  RETURNS BOOLEAN AS $BODY$
    UPDATE t_hist SET isTop=false WHERE isTop=true AND id=$1
    RETURNING true;  -- null se nao encontrado
$BODY$ LANGUAGE sql;

--------
-- Implements INSER/UPDATE/DELETE over VIEW t, 
-- and controls unique id of t:
CREATE OR REPLACE FUNCTION t_cmd_trig() RETURNS TRIGGER AS $$
DECLARE
  aux BOOLEAN;
BEGIN
  aux:=true;
  IF TG_OP = 'DELETE' OR TG_OP = 'UPDATE' THEN
    aux := t_reset_top(OLD.id); -- rets. true ou NULL
  ELSE
    SELECT true INTO aux FROM t_hist WHERE id=NEW.id AND isTop;
  END IF;
  IF (TG_OP='INSERT' AND aux IS NULL) OR (TG_OP='UPDATE' AND aux) THEN
    INSERT INTO t_hist (id,a,b,cmd) VALUES (NEW.id, NEW.a,NEW.b,TG_OP);
  ELSEIF TG_OP='DELETE' AND aux THEN -- if first delete
    UPDATE t_hist SET cmd=cmd||' AND DELETE AT '||now()
  ELSEIF TG_OP='INSERT' THEN -- fails by not-unique(id)
    RAISE EXCEPTION 'REGISTER ID=% EXIST', NEW.id;
  ELSEIF TG_OP='UPDATE' THEN -- .. redundance, a trigger not goes here
    RAISE EXCEPTION 'REGISTER ID=% NOT EXIST', NEW.id;
  END IF;
  RETURN NEW; -- discarded
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER ins_trigs INSTEAD OF INSERT OR UPDATE OR DELETE ON t 
    FOR EACH ROW EXECUTE PROCEDURE t_cmd_trig();

--  Examples:
INSERT INTO t(id,a,b) VALUES (1,'aaaaaa',3); -- ok
INSERT INTO t(id,a,b) VALUES (1,'bbbbbb',3); -- error
UPDATE t_hist SET a='teste' WHERE id=1;      -- ok
     -- SELECT * from t;        SELECT * from t_hist;
INSERT INTO t(id,a,b) VALUES 
  (2,'bbbbbb',22), -- ok
  (3,'bbbbbb',22), -- ok
  (4,'aaaaaa',2);  -- ok
DELETE FROM t WHERE id=3;
     -- SELECT * from t;        SELECT * from t_hist;

PS: I suggest not to try adapt this solution for one table without a view, your trigger will be very complex; neither to try adapt for t_hist inheriting t, where all content inserted in t_hist will copy to t.

like image 136
Peter Krauss Avatar answered Oct 04 '22 21:10

Peter Krauss