Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete records from a database or simply hide them during Reads?

I'm wondering if someone can provide various rationales/solutions for knowing when to delete records from a database vs. simplying hiding them during read operations via a field value, e.g., is_hidden=1.

My application is a social network/e-commerce web application. I tend to favor the is_hidden strategy but as one's site grows I can see this leading to a really badly performing site.

Here's my list. What items on the list am I missing? Is the list's prioritization good?

Delete:

  1. rationale: Reduce table size/improve database performance
  2. rationale: Useful if data is trivial to create
  3. solution: SQL DELETE

is_hidden:

  1. rationale: allow users/DBA to restore data/useful for sensitive & hard to CREATE data
  2. rationale: can DELETE it later if necessary
  3. solution: SQL SELECT ... WHERE is_hidden!=1

Thoughts?

like image 671
tim peterson Avatar asked Feb 18 '23 09:02

tim peterson


2 Answers

The major reason you might want to do a soft-delete is where an audit trail requires it. For example we might have an invoice table along with an voided column and we might normally just omit voided invoices. This preserves an audit trail so we know what invoices were entered and which ones were voided.

There are many fields (particularly in finance) where soft deletes are preferred for this reason. Typically the number of deletes are small compared to the data set, and you don't want to really delete because actually doing so might allow someone to cover for theft of money or real-world goods. The "deleted" data can then be shown for those queries which require it.

A good non-db example would be as follows: "When writing in your general journal or general ledger, write with a pen and if you make an error that you spot right away, cross it out with a single line so that the original data is still legible, and write correct values underneath. If you find out later, either write in an adjustment entry or write in a reversal and a new one." In that case, your principle reason is to see what was changed when so that you can audit those changes if there is ever a question.

The people typically needing to see such information are likely to be financial or other auditors.

like image 138
Chris Travers Avatar answered Feb 19 '23 23:02

Chris Travers


You've already said everthing in your question:

DELETE will entirely delete the entry and

is_hidden=1 will hide it.

So: If there's the possibility that you will need the data in the future you should use the hiding method. If you are sure that the data will never ever be used again: Use delete.

Concerning performance:

You can use two tables:

  • 1 for visible items
  • 1 for the hidden ones

Or even three tables:

  • 1 for visible
  • 1 for hidden
  • 1 as an archive, where you move all the hidden data that's older than 3 years or something

Or:

  • 1 for visible and hidden ones (using the is_hidden flag)
  • 1 as an archive for old entries

It's all up to you. But if you look at facebook or google: They will never ever delete anything! Data == Money == Power ;)

like image 24
Benjamin M Avatar answered Feb 19 '23 22:02

Benjamin M