I think the key question to ask here is 'Who / What is going to be using the history'?
If it's going to be mostly for reporting / human readable history, we've implemented this scheme in the past...
Create a table called 'AuditTrail' or something that has the following fields...
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NULL,
[EventDate] [datetime] NOT NULL,
[TableName] [varchar](50) NOT NULL,
[RecordID] [varchar](20) NOT NULL,
[FieldName] [varchar](50) NULL,
[OldValue] [varchar](5000) NULL,
[NewValue] [varchar](5000) NULL
You can then add a 'LastUpdatedByUserID' column to all of your tables which should be set every time you do an update / insert on the table.
You can then add a trigger to every table to catch any insert / update that happens and creates an entry in this table for each field that's changed. Because the table is also being supplied with the 'LastUpdateByUserID' for each update / insert, you can access this value in the trigger and use it when adding to the audit table.
We use the RecordID field to store the value of the key field of the table being updated. If it's a combined key, we just do a string concatenation with a '~' between the fields.
I'm sure this system may have drawbacks - for heavily updated databases the performance may be hit, but for my web-app, we get many more reads than writes and it seems to be performing pretty well. We even wrote a little VB.NET utility to automatically write the triggers based on the table definitions.
Just a thought!
CREATE VIEW EmployeeHistory
AS
, FirstName, , DepartmentId
SELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,
RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,
RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,
FROM EmployeeHistories
The History Tables article in the Database Programmer blog might be useful - covers some of the points raised here and discusses the storage of deltas.
Edit
In the History Tables essay, the author (Kenneth Downs), recommends maintaining a history table of at least seven columns:
Columns which never change, or whose history is not required, should not be tracked in the history table to avoid bloat. Storing the delta for numerical values can make subsequent queries easier, even though it can be derived from the old and new values.
The history table must be secure, with non-system users prevented from inserting, updating or deleting rows. Only periodic purging should be supported to reduce overall size (and if permitted by the use case).
We have implemented a solution very similar to the solution that Chris Roberts suggests, and that works pretty well for us.
Only difference is that we only store the new value. The old value is after all stored in the previous history row
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NULL,
[EventDate] [datetime] NOT NULL,
[TableName] [varchar](50) NOT NULL,
[RecordID] [varchar](20) NOT NULL,
[FieldName] [varchar](50) NULL,
[NewValue] [varchar](5000) NULL
Lets say you have a table with 20 columns. This way you only have to store the exact column that has changed instead of having to store the entire row.
Avoid Design 1; it is not very handy once you will need to for example rollback to old versions of the records - either automatically or "manually" using administrators console.
I don't really see disadvantages of Design 2. I think the second, History table should contain all columns present in the first, Records table. E.g. in mysql you can easily create table with the same structure as another table (create table X like Y
). And, when you are about to change structure of the Records table in your live database, you have to use alter table
commands anyway - and there is no big effort in running these commands also for your History table.
Notes
RevisionId
column;ModifiedBy
- the user who created particular revision. You may also want to have a field DeletedBy
to track who deleted particular revision.DateModified
should mean - either it means where this particular revision was created, or it will mean when this particular revision was replaced by another one. The former requires the field to be in the Records table, and seems to be more intuitive at the first sight; the second solution however seems to be more practical for deleted records (date when this particular revision was deleted). If you go for the first solution, you would probably need a second field DateDeleted
(only if you need it of course). Depends on you and what you actually want to record.Operations in Design 2 are very trivial:
ModifyIf you go for Design 2, all SQL commands needed to do that will be very very easy, as well as maintenance! Maybe, it will be much much easier if you use the auxiliary columns (RevisionId
, DateModified
) also in the Records table - to keep both tables at exactly the same structure (except for unique keys)! This will allow for simple SQL commands, which will be tolerant to any data structure change:
insert into EmployeeHistory select * from Employe where ID = XX
Don't forget to use transactions!
As for the scaling, this solution is very efficient, since you don't transform any data from XML back and forth, just copying whole table rows - very simple queries, using indices - very efficient!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With