Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Revisions within Oracle

I have a table say:

CREATE TABLE "DataNode" (
   "ID" NUMBER(7,0),
   "TYPE" NUMBER(7,0),
   "NAME" VARCHAR2(100),
   "STATUS" NUMBER(7,0),
   "REVISION" NUMBER(4,0),
   "MODIFIEDAT" DATE
 );

 CREATE TABLE "DataNode_Revisions" (
   "ID" NUMBER(7,0),
   "NODEID" NUMBER(7,0),
   "TYPE" NUMBER(7,0),
   "NAME" VARCHAR2(100),
   "STATUS" NUMBER(7,0),
   "REVISION" NUMBER(4,0),
   "MODIFIEDAT" DATE
 ) COMPRESS;

So I have these two tables. I do all my reads from "DataNode" and when a change occurs I write out the current entry to "DataNode_Revisions" and then modify my existing "DataNode" record. Makes sense?

Is this the best way to go about it? I can already tell I am going to run into problems when the Schema changes. I am not seeing a better alternative but if there is one please let me know! I assume keeping this all in one table would result in massive performance losses woudl it not? I mean I would be more then quadrupling the number of records and there is already quite a few. I think Drupal stores node revisions like this, and I am curious how they do not suffer performance problems from it.

"DataNode" is constantly being read by a lot of users. However, very few writes ever occur. "DataNode_Revisions" is only read from on occasion. I am just worried about maintaining so many tables. "DataNode" is one of ~25 tables very similar to this one.

like image 473
jreid42 Avatar asked Nov 23 '10 16:11

jreid42


4 Answers

Whether there will be any performance implications from storing the old rows in the DataNode table depends on how the DataNode rows are accessed. If the reads are all single-row lookups for the current row, the number of rows in the table is relatively immaterial-- it's not going to take any more work to find the current row for a particular ID than it would to get the row for that ID from the current DataNode table (I'm assuming here that ID is the key for the table). On the other hand, if you have a number of queries that are doing table scans of the DataNode table, then quadrupling the number of rows will increase the time required to run those queries.

If you want to go down the path of putting the historical rows in the DataNode table, you would probably want to add an EXPIRATION_DATE column that is NULL for the current row and populated for the expired rows. You could then create a function-based index based on the EXPIRATION_DATE that would have data for only the current rows, i.e.

CREATE INDEX idx_current_ids
    ON DataNode( (CASE WHEN expiration_date IS NULL THEN id ELSE null END) );

which would be used in a query like

SELECT *
  FROM DataNode
 WHERE (CASE WHEN expiration_date IS NULL THEN id ELSE null END) = <<some id>>

Obviously, you'd probably want to create a view that has this condition rather than rewriting it every time you need the current row, i.e.

CREATE VIEW CurrentDataNode
AS
SELECT (CASE WHEN expiration_date IS NULL THEN id ELSE null END) id,
       type,
       name,
       status
  FROM DataNode;

SELECT *
  FROM CurrentDataNode
 WHERE id = <<some value>>
like image 55
Justin Cave Avatar answered Nov 11 '22 20:11

Justin Cave


I usually use triggers to do the writing to the 'Revisions' table. Yes, schema changes force you to update the mirror table and trigger/archive function.

I think you will regret keeping all your history as well as the current revision in a single table, so I think you've got the right idea.

If you want to try to come up with a generic solution that doesn't require a mirror table for every one of your transactional tables you might consider having just a single revisions table where you convert records to XML and store that in a clob... not very useful if you have to access it often or quickly, but good if you're really just wanting to archive everything.

like image 35
aw crud Avatar answered Nov 11 '22 18:11

aw crud


It's going to depend on the application. If you're on 11g, you might want to look at the new Flashback Data Archive. I'm just starting to look at it to keep history on all our financial and other critical data.

like image 2
JOTN Avatar answered Nov 11 '22 18:11

JOTN


You have a few options. What is the business requirement that forces you to keep track of data changes?

  • if you only need to keep changes for some "short" period of time, you could read the data from UNDO using flashback query.. select * from table as of timestamp (bla);

  • if you need to retain this information long term, take a look at t feature called Oracle Total Recall. It does the same as Flashback Query, but retains the changes indefinitely.

  • if you need something simpler, don't have the app insert the "old" version of the rows. Use a trigger that populates the data.

  • if the system is extremely busy, you can decouple the two tables by having an intermediary table that you use as a "queue"

like image 2
erbsock Avatar answered Nov 11 '22 19:11

erbsock