Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Soft Delete vs. DB Archive

Suggested Reading

  • Similar: Are soft deletes a good idea?
  • Good Article: http://weblogs.asp.net/fbouma/archive/2009/02/19/soft-deletes-are-bad-m-kay.aspx

    How I ended up here

    I strongly belive that when making software, anything done up front to minimize work later on pays off in truck loads. As such, I am trying to make sure when approaching my database schema and maintenance that it can maintain relational integrity while not being archaic or overly complex.

    This resulted in a sort of shudder when looking at the typical delete approach, CASCADE. Yikes, a little over the top for my current situation. I wanted to maintain relational graph integrity, but I didn't want to remove every graph just because one part of the chain was irrelevant. Therefore I chose to go the way of soft deleting to make sure data integrity would remain while records could be removed from relevance. I accomplished this by adding a "DateDeleted" field to every, sigh, table in the database.

    Turning Point

    However, this is clearly starting to add too much complexity and work to be worth it. I am including logic where it should not go and do not feel like perpetuating these bad practices throughout my whole application. In short, I am going to roll back this implementation.

    When looking up weather or not people like soft-deleting, it seems there is a lot of support for it. In fact, the linked "Similar" post up top sports a top voted answer of "I always soft-delete". Moreover, the majority of answers there and around SO include some sort of "isDeleted" or "isActive" type of approach.

    New Implementation Idea

    The "Good Article" linked covers some of the issues I actually began encountering. It also suggests an alternative to soft-deleting which I found spot on from a best practices standpoint. The suggestion is to use an "Archiving Database", which I had actually considered when looking at soft deleting. The reason I decided against it was because of the point I made earlier about CASCADE deleting. I am wary to remove entire graphs from the database because one part of the chain is removed. However, this graph would be able to be retained at least from the archive so I am not sure that it would be really that terrible.

    Crossroads

    So, should I just keep adding logic, logic, logic....logic? Or, should I consider making the archival database where most of the logic would simply sit in a very complex graph management class to store / restore relational object graphs? The latter seems to be best practice to me.

  • like image 236
    Travis J Avatar asked Mar 26 '12 22:03

    Travis J


    People also ask

    Are soft deletes a good idea?

    Make sure ALL queries mentioning soft-deleted entities are double-checked, otherwise it can lead to unexpected data leaks and critical performance issues. In the ideal world, a developer should not be aware about soft delete existence.

    What is a soft delete?

    soft deletion (plural soft deletions) (databases) An operation in which a flag is used to mark data as unusable, without erasing the data itself from the database.

    What is difference between soft delete and hard delete?

    Hard deletes are hard to recover from if something goes wrong (application bug, bad migration, manual query, etc.). This usually involves restoring from a backup and it is hard to target only the data affected by the bad delete. Soft deletes are easier to recover from once you determine what happened.

    What is soft delete in ETL?

    Soft deletes resolves this: the data still exists in the database, and it is possible to clear the deleted flag. For years I've implemented soft deletes in systems, initially using a Boolean flag (is it deleted, yes or no) but more recently using a timestamp of when it was “deleted”.


    1 Answers

    Soft deleting is definitely an easy approach in theory. However, not really much attention is paid to what to do with the data that wasn't deleted. In fact, it is glossed over.

    In my opinion this is because the wrong issue is in focus. Not just "what does deleting mean", but what IS being deleted. When a record is to be removed, what is really being removed is a node in a graph - not just a single record. That whole graph integriy is the reason for people to bandaid over the issue with "soft deletes". These bandaid solutions tend to hide the gangrene underneath - a festering problem which only gets worse with time.

    What's worse is that in order to accompany the soft delete logic must be included all over (many times breaking various conventions and implementing anti-patterns) to account for the possible breaks in the object graph. Moreover, what kind of business logic is "isDeleted"?!

    I believe a very strong solution to this problem, the problem of removing an object while retaining the referential integrity of the object graph, is to use an archival pattern. On delete of an object, the object is archived then deleted. The archive database, a mirror database with meta data (temporal database design can be used and is very relevant here), would then receive the object to be archived and restored if necessary.

    This makes it very direct to avoid listing or including a deleted object as the relevant database will no longer hold it. Now, the same logic which was applied looking for "isDeleted" "isActive" or "DeletedDate" can be applied in the correct place (Not all over the place) to foreign keys of retrieved objects. When a foreign key is present, but the object is not, then there is now a logical explanation and a logical set of options. Display that the containing object was deleted and some course of action: "Restore, Delete Current Containing Object, View Deleted". These options can be either chosen by the user, or explicitly defined in code in a logical manner. Depending on how advanced the archival database is, perhaps more options exist such as who deleted it, when, why, etc. etc.

    like image 81
    Travis J Avatar answered Sep 27 '22 20:09

    Travis J