Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Soft Delete - Use IsDeleted flag or separate joiner table?

Should we use a flag for soft deletes, or a separate joiner table? Which is more efficient? Database is SQL Server.

Background Information

A while back we had a DB consultant come in and look at our database schema. When we soft delete a record, we would update an IsDeleted flag on the appropriate table(s). It was suggested that instead of using a flag, store the deleted records in a seperate table and use a join as that would be better. I've put that suggestion to the test, but at least on the surface, the extra table and join looks to be more expensive then using a flag.

Initial Testing

I've set up this test.

Two tables, Example and DeletedExample. I added a nonclustered index on the IsDeleted column.

I did three tests, loading a million records with the following deleted/non deleted ratios:

  • Deleted/NonDeleted
  • 50/50
  • 10/90
  • 1/99

Results - 50/50 50 50

Results - 10/90 10 90

Results - 1/99 enter image description here

Database Scripts, For Reference, Example, DeletedExample, and Index for Example.IsDeleted

CREATE TABLE [dbo].[Example](
    [ID] [int] NOT NULL,
    [Column1] [nvarchar](50) NULL,
    [IsDeleted] [bit] NOT NULL,
 CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Example] ADD  CONSTRAINT [DF_Example_IsDeleted]  DEFAULT ((0)) FOR [IsDeleted]
GO

CREATE TABLE [dbo].[DeletedExample](
    [ID] [int] NOT NULL,
 CONSTRAINT [PK_DeletedExample] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DeletedExample]  WITH CHECK ADD  CONSTRAINT [FK_DeletedExample_Example] FOREIGN KEY([ID])
REFERENCES [dbo].[Example] ([ID])
GO

ALTER TABLE [dbo].[DeletedExample] CHECK CONSTRAINT [FK_DeletedExample_Example]
GO

CREATE NONCLUSTERED INDEX [IX_IsDeleted] ON [dbo].[Example] 
(
    [IsDeleted] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
like image 366
Jon Raynor Avatar asked Jan 13 '12 20:01

Jon Raynor


2 Answers

The numbers you have seem to indicate that my initial impression was correct: if your most common query against this database is to filter on IsDeleted = 0, then performance will be better with a simple bit flag, especially if you make wise use of indexes.

If you often query for deleted and undeleted data separately, then you could see a performance gain by having a table for deleted items and another for undeleted items, with identical fields. But denormalizing your data like this is rarely a good idea, as it will most often cost you far more in code maintenance costs than it will gain you in performance increases.

like image 95
StriplingWarrior Avatar answered Oct 30 '22 08:10

StriplingWarrior


I'm not the SQL expert but in my opinion,it all depends on the usage frequency of the database. If the database is accessed by the large number of users and needs to be efficient then usage of a seperate isDeleted table will be good. The better option would be using a flag during the production time and as a part of daily/weekly/monthly maintanace you may move all the soft deleted records to the isDeleted table and clear the production table of soft deleted records. The mixture of both option will be good a good one.

like image 41
blackwind Avatar answered Oct 30 '22 08:10

blackwind