Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Physical vs. logical (hard vs. soft) delete of database record?

People also ask

What is soft delete and hard delete in database?

Hard deletes are hard to recover from if something goes wrong (application bug, bad migration, manual query, etc.). This usually involves restoring from a backup and it is hard to target only the data affected by the bad delete. Soft deletes are easier to recover from once you determine what happened.

What is physical and logical deletion?

A logical parent is considered logically deleted when all its logical children are physically deleted. For physically paired logical relationships, the physical child paired to the logical child must also be physically deleted before the logical parent is considered logically deleted.

What is soft delete in database?

soft deletion (plural soft deletions) (databases) An operation in which a flag is used to mark data as unusable, without erasing the data itself from the database.

What is SQL logical delete?

Logical deletion is the process of not actually deleting the data, but setting a column called “flag” to make it appear as if the data has been deleted to the user. For example, if you have a tweets table. For example, if you have a tweets table, you can add a boolean column called deleted_flag to the table.


Advantages are that you keep the history (good for auditing) and you don't have to worry about cascading a delete through various other tables in the database that reference the row you are deleting. Disadvantage is that you have to code any reporting/display methods to take the flag into account.

As far as if it is a common practice - I would say yes, but as with anything whether you use it depends on your business needs.

EDIT: Thought of another disadvantange - If you have unique indexes on the table, deleted records will still take up the "one" record, so you have to code around that possibility too (for example, a User table that has a unique index on username; A deleted record would still block the deleted users username for new records. Working around this you could tack on a GUID to the deleted username column, but it's a very hacky workaround that I wouldn't recommend. Probably in that circumstance it would be better to just have a rule that once a username is used, it can never be replaced.)


Are logical deletes common practice? Yes I have seen this in many places. Are they secure? That really depends are they any less secure then the data was before you deleted it?

When I was a Tech Lead, I demanded that our team keep every piece of data, I knew at the time that we would be using all that data to build various BI applications, although at the time we didn't know what the requirements would be. While this was good from the standpoint of auditing, troubleshooting, and reporting (This was an e-commerce / tools site for B2B transactions, and if someone used a tool, we wanted to record it even if their account was later turned off), it did have several downsides.

The downsides include (not including others already mentioned):

  1. Performance Implications of keeping all that data, We to develop various archiving strategies. For example one area of the application was getting close to generating around 1Gb of data a week.
  2. Cost of keeping the data does grow over time, while disk space is cheap, the ammount of infrastructure to keep and manage terrabytes of data both online and off line is a lot. It takes a lot of disk for redundancy, and people's time to ensure backups are moving swiftly etc.

When deciding to use logical, physical deletes, or archiving I would ask myself these questions:

  1. Is this data that might need to be re-inserted into the table. For example User Accounts fit this category as you might activate or deactivate a user account. If this is the case a logical delete makes the most sense.
  2. Is there any intrinsic value in storing the data? If so how much data will be generated. Depending on this I would either go with a logical delete, or implement an archiving strategy. Keep in mind you can always archive logically deleted records.

It might be a little late but I suggest everyone to check Pinal Dave's blog post about logical/soft delete:

I just do not like this kind of design [soft delete] at all. I am firm believer of the architecture where only necessary data should be in single table and the useless data should be moved to an archived table. Instead of following the isDeleted column, I suggest the usage of two different tables: one with orders and another with deleted orders. In that case, you will have to maintain both the table, but in reality, it is very easy to maintain. When you write UPDATE statement to the isDeleted column, write INSERT INTO another table and DELETE it from original table. If the situation is of rollback, write another INSERT INTO and DELETE in reverse order. If you are worried about a failed transaction, wrap this code in TRANSACTION.

What are the advantages of the smaller table verses larger table in above described situations?

  • A smaller table is easy to maintain
  • Index Rebuild operations are much faster
  • Moving the archive data to another filegroup will reduce the load of primary filegroup (considering that all filegroups are on different system) – this will also speed up the backup as well.
  • Statistics will be frequently updated due to smaller size and this will be less resource intensive.
  • Size of the index will be smaller
  • Performance of the table will improve with a smaller table size.

I'm a NoSQL developer, and on my last job, I worked with data that was always critical for someone, and if it was deleted by accident in the same day that was created, I were not able to find it in the last backup from yesterday! In that situation, soft deletion always saved the day.

I did soft-deletion using timestamps, registering the date the document was deleted:

IsDeleted = 20150310  //yyyyMMdd

Every Sunday, a process walked on the database and checked the IsDeleted field. If the difference between the current date and the timestamp was greater than N days, the document was hard deleted. Considering the document still be available on some backup, it was safe to do it.

EDIT: This NoSQL use case is about big documents created in the database, tens or hundreds of them every day, but not thousands or millions. By general, they were documents with the status, data and attachments of workflow processes. That was the reason why there was the possibility of a user deletes an important document. This user could be someone with Admin privileges, or maybe the document's owner, just to name a few.

TL;DR My use case was not Big Data. In that case, you will need a different approach.


One pattern I have used is to create a mirror table and attach a trigger on the primary table, so all deletes (and updates if desired) are recorded in the mirror table.

This allows you to "reconstruct" deleted/changed records, and you can still hard delete in the primary table and keep it "clean" - it also allows the creation of an "undo" function, and you can also record the date, time, and user who did the action in the mirror table (invaluable in witch hunt situations).

The other advantage is there is no chance of accidentally including deleted records when querying off the primary unless you deliberately go to the trouble of including records from the mirror table (you may want to show live and deleted records).

Another advantage is that the mirror table can be independently purged, as it should not have any actual foreign key references, making this a relatively simple operation in comparison to purging from a primary table that uses soft deletes but still has referential connections to other tables.

What other advantages? - great if you have a bunch of coders working on the project, doing reads on the database with mixed skill and attention to detail levels, you don't have to stay up nights hoping that one of them didn’t forget to not include deleted records (lol, Not Include Deleted Records = True), which results in things like overstating say the clients available cash position which they then go buy some shares with (i.e., as in a trading system), when you work with trading systems, you will find out very quickly the value of robust solutions, even though they may have a little bit more initial "overhead".

Exceptions:
- as a guide, use soft deletes for "reference" data such as user, category, etc, and hard deletes to a mirror table for "fact" type data, i.e., transaction history.


I commonly use logical deletions - I find they work well when you also intermittently archive off the 'deleted' data to an archived table (which can be searched if needed) thus having no chance of affecting the performance of the application.

It works well because you still have the data if you're ever audited. If you delete it physically, it's gone!


I'm a big fan of the logical delete, especially for a Line of Business application, or in the context of user accounts. My reasons are simple: often times I don't want a user to be able to use the system anymore (so the account get's marked as deleted), but if we deleted the user, we'd lose all their work and such.

Another common scenario is that the users might get re-created a while after having been delete. It's a much nicer experience for the user to have all their data present as it was before they were deleted, rather than have to re-create it.

I usually think of deleting users more as "suspending" them indefinitely. You never know when they'll legitimately need to be back.