Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing archive in database. Some patterns maybe?

We are currently doing an web application which one of the functionality is to create Events by the user. Those events can be later on deleted by the user or administrator. However the client requires that the event is not really physically deleted from database, but just marked as deleted. User should only see non-deleted events, but administrators should be able to browse also through deleted ones. That's all really the functionality there is.

Now I suggested that we should simply add one more extra column called "status", which would have couple of valid values: ACTIVE and DELETED. This way we can distinguish between normal(active) and deleted events and create really simple queries (SELECT * FROM EVENTS WHERE STATUS = 'ACTIVE'). My colleague however disagreed. He pointed out that regardless of the fact that right now active events and deleted events share same information (thus they can be stored in the same table) in a future requirements my change and client for example will need to store some additional information about deleted Event (like date of deletion, who deleted it, why he did it - sort of comment). He said that to fulfil those requirements in a future we would have to add additional columns in EVENTS table that would hold data specific for the deleted Events and not for active events. He proposed a solution, where additional table is created (like DELETED_EVENTS) with same schema as EVENTS table. Every deleted event would be physical deleted from EVENTS table and be moved to DELETED_EVENTS table.

I strongly disagreed with his idea. Not only would it make SQL query more complex and less efficient but also this totally is against YAGNI. I also disagreed with him that my idea would made us to create additional (not nullable) columns in EVENTS table, if the requirements changed in a future. In my scenario I would simply create new table like DELETED_EVENTS_DATA (that would hold those additional, archive data) and would add reference key in the EVENTS table to maintain one to one relationship between EVETNS and DELETED_EVENTS_DATA tables.

Nevertheless I was struggled by the fact that two developers who commonly share similar view on software and database design could have so radically different opinions about how this requirements should be designed in a database level. I thought that we maybe both going in a wrong direction and there is another (third) solution? Or are there more then just one alternative? How do you design this sort of requirements? Are there any patterns or guidelines on how to do it properly? Any help will be deeply appreciated

like image 814
Pawel Szulc Avatar asked Jan 20 '10 22:01

Pawel Szulc


People also ask

What are archives in database?

Data archiving is the practice of identifying data that is no longer active and moving it out of production systems into long-term storage systems. Archival data is stored so that at any time it can be brought back into service.

What are archive tables?

An archive table is a table that stores older rows from another table. The original table is called an archive-enabled table . Db2 can automatically store rows that are deleted from an archive-enabled table in an associated archive table.


2 Answers

Don't use a status column.

At minimum you should have a datedeleted and a deletedby columns. Just knowing something was removed isn't helpful, even if the client isn't asking for it right now the very first time they go to look at the deleted events they will want to know who in order to discern why.

If the events table is likely to grow pretty large in size it is common to move the deleted / archived data into a different table entirely. Usually you will allocate those tables to a different database file. That file usually lives on a different drive in order to keep performance up. I'm not saying a whole new database, just a different database file.

If you keep it in the same table, all of your queries should have a where clause on (DateDeleted is null). Obviously you don't have that requirement if the information is moved to a different table.. Which is why I recommend that way of doing things.

like image 56
NotMe Avatar answered Sep 24 '22 08:09

NotMe


I found that taking snapshots of an object with every event (creation, update, etc.) and storing those snapshots (along with dates and user info) in another table allows you to meet all kinds of historical tracking needs in the lifetime of an application. You can then present the snapshots to the user, present chronological changes to the user, deduce the state of an object on a given date, etc..

I'm sure there are official design patterns out there - this is just one that I've refined over time and it works well. It's not efficient with disk space however.

EDIT: Also, when user deleted an object, I would flag the record as deleted and take a final snapshot for the history table. You could hide the object from the interface indefinitely or you could choose to show it - depends on usage needs.

like image 20
Mayo Avatar answered Sep 23 '22 08:09

Mayo