Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enforcing mutual uniqueness across multiple columns

I'm trying to find an intuitive way of enforcing mutual uniqueness across two columns in a table. I am not looking for composite uniqueness, where duplicate combinations of keys are disallowed; rather, I want a rule where any of the keys cannot appear again in either column. Take the following example:

CREATE TABLE Rooms
(
    Id INT NOT NULL PRIMARY KEY,
)

CREATE TABLE Occupants
(
    PersonName VARCHAR(20),
    LivingRoomId INT NULL REFERENCES Rooms (Id),
    DiningRoomId INT NULL REFERENCES Rooms (Id),
)

A person may pick any room as their living room, and any other room as their dining room. Once a room has been allocated to an occupant, it cannot be allocated again to another person (whether as a living room or as a dining room).

I'm aware that this issue can be resolved through data normalization; however, I cannot change the schema make breaking changes to the schema.

Update: In response to the proposed answers:

Two unique constraints (or two unique indexes) will not prevent duplicates across the two columns. Similarly, a simple LivingRoomId != DiningRoomId check constraint will not prevent duplicates across rows. For example, I want the following data to be forbidden:

INSERT INTO Rooms VALUES (1), (2), (3), (4)
INSERT INTO Occupants VALUES ('Alex',    1, 2)
INSERT INTO Occupants VALUES ('Lincoln', 2, 3)

Room 2 is occupied simultaneously by Alex (as a living room) and by Lincoln (as a dining room); this should not be allowed.

Update2: I've run some tests on the three main proposed solutions, timing how long they would take to insert 500,000 rows into the Occupants table, with each row having a pair of random unique room ids.

Extending the Occupants table with unique indexes and a check constraint (that calls a scalar function) causes the insert to take around three times as long. The implementation of the scalar function is incomplete, only checking that new occupants' living room does not conflict with existing occupants' dining room. I couldn't get the insert to complete in reasonable time if the reverse check was performed as well.

Adding a trigger that inserts each occupant's room as a new row into another table decreases performance by 48%. Similarly, an indexed view takes 43% longer. In my opinion, using an indexed view is cleaner, since it avoids the need for creating another table, as well as allows SQL Server to automatically handle updates and deletes as well.

The full scripts and results from the tests are given below:

SET STATISTICS TIME OFF
SET NOCOUNT ON

CREATE TABLE Rooms
(
    Id INT NOT NULL PRIMARY KEY IDENTITY(1,1),
    RoomName VARCHAR(10),
)

CREATE TABLE Occupants
(
    Id INT NOT NULL PRIMARY KEY IDENTITY(1,1),
    PersonName VARCHAR(10),
    LivingRoomId INT NOT NULL REFERENCES Rooms (Id),
    DiningRoomId INT NOT NULL REFERENCES Rooms (Id)
)

GO

DECLARE @Iterator INT = 0
WHILE (@Iterator < 10)
BEGIN
    INSERT INTO Rooms
    SELECT TOP (1000000) 'ABC'
    FROM sys.all_objects s1 WITH (NOLOCK)
        CROSS JOIN sys.all_objects s2 WITH (NOLOCK)
        CROSS JOIN sys.all_objects s3 WITH (NOLOCK);
    SET @Iterator = @Iterator + 1
END;

DECLARE @RoomsCount INT = (SELECT COUNT(*) FROM Rooms);

SELECT TOP 1000000 RoomId
INTO ##RandomRooms
FROM 
(
    SELECT DISTINCT
        CAST(RAND(CHECKSUM(NEWID())) * @RoomsCount AS INT) + 1 AS RoomId
    FROM sys.all_objects s1 WITH (NOLOCK)
        CROSS JOIN sys.all_objects s2 WITH (NOLOCK)

) s

ALTER TABLE ##RandomRooms
ADD Id INT IDENTITY(1,1)

SELECT
    'XYZ' AS PersonName,
    R1.RoomId AS LivingRoomId,
    R2.RoomId AS DiningRoomId
INTO ##RandomOccupants
FROM ##RandomRooms R1
    JOIN ##RandomRooms R2
        ON  R2.Id % 2 = 0
        AND R2.Id = R1.Id + 1

GO

PRINT CHAR(10) + 'Test 1: No integrity check'

CHECKPOINT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT OFF
SET STATISTICS TIME ON

INSERT INTO Occupants
SELECT *
FROM ##RandomOccupants

SET STATISTICS TIME OFF
SET NOCOUNT ON

TRUNCATE TABLE Occupants

PRINT CHAR(10) + 'Test 2: Unique indexes and check constraint'

CREATE UNIQUE INDEX UQ_LivingRoomId
ON Occupants (LivingRoomId)

CREATE UNIQUE INDEX UQ_DiningRoomId
ON Occupants (DiningRoomId)

GO

CREATE FUNCTION CheckExclusiveRoom(@occupantId INT)
RETURNS BIT AS
BEGIN
RETURN 
(
    SELECT CASE WHEN EXISTS
    (
        SELECT *
        FROM Occupants O1
            JOIN Occupants O2
                ON O1.LivingRoomId = O2.DiningRoomId
             -- OR O1.DiningRoomId = O2.LivingRoomId
        WHERE O1.Id = @occupantId
    )
    THEN 0
    ELSE 1
    END
)
END

GO

ALTER TABLE Occupants
ADD CONSTRAINT ExclusiveRoom 
CHECK (dbo.CheckExclusiveRoom(Id) = 1)

CHECKPOINT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT OFF
SET STATISTICS TIME ON

INSERT INTO Occupants
SELECT *
FROM ##RandomOccupants

SET STATISTICS TIME OFF
SET NOCOUNT ON

ALTER TABLE Occupants DROP CONSTRAINT ExclusiveRoom
DROP INDEX UQ_LivingRoomId ON Occupants
DROP INDEX UQ_DiningRoomId ON Occupants
DROP FUNCTION CheckExclusiveRoom

TRUNCATE TABLE Occupants

PRINT CHAR(10) + 'Test 3: Insert trigger'

CREATE TABLE RoomTaken 
(
    RoomId INT NOT NULL PRIMARY KEY REFERENCES Rooms (Id) 
)

GO

CREATE TRIGGER UpdateRoomTaken
ON Occupants
AFTER INSERT
AS 
    INSERT INTO RoomTaken
    SELECT RoomId
    FROM
    (
        SELECT LivingRoomId AS RoomId
        FROM INSERTED
            UNION ALL
        SELECT DiningRoomId AS RoomId
        FROM INSERTED
    ) s

GO  

CHECKPOINT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT OFF
SET STATISTICS TIME ON

INSERT INTO Occupants
SELECT *
FROM ##RandomOccupants

SET STATISTICS TIME OFF
SET NOCOUNT ON

DROP TRIGGER UpdateRoomTaken
DROP TABLE RoomTaken

TRUNCATE TABLE Occupants

PRINT CHAR(10) + 'Test 4: Indexed view with unique index'

CREATE TABLE TwoRows
(
    Id INT NOT NULL PRIMARY KEY
)

INSERT INTO TwoRows VALUES (1), (2)

GO

CREATE VIEW OccupiedRooms
WITH SCHEMABINDING
AS
    SELECT RoomId = CASE R.Id WHEN 1 
                    THEN O.LivingRoomId 
                    ELSE O.DiningRoomId 
                    END
    FROM dbo.Occupants O
        CROSS JOIN dbo.TwoRows R

GO

CREATE UNIQUE CLUSTERED INDEX UQ_OccupiedRooms
ON OccupiedRooms (RoomId);

CHECKPOINT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT OFF
SET STATISTICS TIME ON

INSERT INTO Occupants
SELECT *
FROM ##RandomOccupants

SET STATISTICS TIME OFF
SET NOCOUNT ON

DROP INDEX UQ_OccupiedRooms ON OccupiedRooms
DROP VIEW OccupiedRooms
DROP TABLE TwoRows

TRUNCATE TABLE Occupants

DROP TABLE ##RandomRooms
DROP TABLE ##RandomOccupants

DROP TABLE Occupants
DROP TABLE Rooms


/* Results:

Test 1: No integrity check

 SQL Server Execution Times:
   CPU time = 5210 ms,  elapsed time = 10853 ms.

(500000 row(s) affected)

Test 2: Unique indexes and check constraint

 SQL Server Execution Times:
   CPU time = 21996 ms,  elapsed time = 27019 ms.

(500000 row(s) affected)

Test 3: Insert trigger
SQL Server parse and compile time: 
   CPU time = 5663 ms, elapsed time = 11192 ms.

 SQL Server Execution Times:
   CPU time = 4914 ms,  elapsed time = 4913 ms.

(1000000 row(s) affected)

 SQL Server Execution Times:
   CPU time = 10577 ms,  elapsed time = 16105 ms.

(500000 row(s) affected)

Test 4: Indexed view with unique index

 SQL Server Execution Times:
   CPU time = 10171 ms,  elapsed time = 15777 ms.

(500000 row(s) affected)

*/
like image 936
Douglas Avatar asked Jul 03 '14 20:07

Douglas


People also ask

How do you enforce conditional unique multiple columns?

We implement this using CASE WHEN ensconced in the CREATE UNIQUE INDEX statement: SQL> -- Conditional unique index on multiple columns SQL> create unique index demo_fbi_idx 2 on demo_fbi 3 (case when active_flag = 'Y' then 4 col1 else null end, 5 case when active_flag = 'Y' then 6 col2 else null end); Index created.

Can unique constraints be multiple?

Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a UNIQUE constraint. However, you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

What is unique index with multiple columns?

A unique index ensures the index key columns do not contain any duplicate values. A unique index may consist of one or many columns. If a unique index has one column, the values in this column will be unique. In case the unique index has multiple columns, the combination of values in these columns is unique.

Can a table have multiple unique columns?

Yes, you can define UNIQUE constraints to columns other than the primary key in order to ensure the data is unique between rows.


2 Answers

I think the only way to do this is to use constraint and a Function.

Pseudo code (haven't done this for a long time):

CREATE FUNCTION CheckExlusiveRoom
RETURNS bit
declare @retval bit
set @retval = 0
    select retval = 1 
      from Occupants as Primary
      join Occupants as Secondary
        on Primary.LivingRoomId = Secondary.DiningRoomId
     where Primary.ID <> Secondary.ID
        or (   Primary.DiningRoomId= Secondary.DiningRoomId
            or Primary.LivingRoomId = Secondary.LivingRoomID)
return @retval
GO

Then, use this function in a check constraint....

Alternative would be to use an intermediate table OccupiedRoom, where you would always insert rooms that are used (by trigger for instance? ) and FK to it instead of Room table

Reaction to comment:

Do you need to enforce it directly on the table, or is a constraint violation happening in reaction to insert/update enough? Because then I am thinking like this:

  1. create a simple table:

    create table RoomTaken (RoomID int primary key references Room (Id) )
    
  2. create a trigger on insert/update/delete, that makes sure that any Room used in Occupants is kept in RoomID as well.

  3. If you try to duplicate room usage, RoomTaken table will throw an PK violation

Not sure if this is enough and/or how it would compare in speed to the UDF (I assume it would be superior).

And yes, I see the problem that RoomTaken would not FK to the usage in Occupants, but... really, you are working under some constraints and there is no perfect solution - it's speed (UDF) vs 100% integrity enforcement in my opinion.

like image 87
Vladislav Zalesak Avatar answered Oct 08 '22 20:10

Vladislav Zalesak


You could create an "external" constraint in the form of an indexed view:

CREATE VIEW dbo.OccupiedRooms
WITH SCHEMABINDING
AS
SELECT r.Id
FROM   dbo.Occupants AS o
INNER JOIN dbo.Rooms AS r ON r.Id IN (o.LivingRoomId, o.DiningRoomId)
;
GO

CREATE UNIQUE CLUSTERED INDEX UQ_1 ON dbo.OccupiedRooms (Id);

The view is essentially unpivoting the occupied rooms' IDs, putting them all in one column. The unique index on that column makes sure it does not have duplicates.

Here are demonstrations of how this method works:

  • failed insert;

  • successful insert.

UPDATE

As hvd has correctly remarked, the above solution does not catch attempts to insert identical LivingRoomId and DiningRoomId when they are put on the same row. This is because the dbo.Rooms table is matched only once in that case and, therefore, the join produces produces just one row for the pair of references.

One way to fix that is suggested in the same comment: additionally to the indexed view, use a CHECK constraint on the dbo.OccupiedRooms table to prohibit rows with identical room IDs. The suggested LivingRoomId <> DiningRoomId condition, however, will not work for cases where both columns are NULL. To account for that case, the condition could be expanded to this one:

LivingRoomId <> DinindRoomId AND (LivingRoomId IS NOT NULL OR DinindRoomId IS NOT NULL)

Alternatively, you could change the view's SELECT statement to catch all situations. If LivingRoomId and DinindRoomId were NOT NULL columns, you could avoid a join to dbo.Rooms and unpivot the columns using a cross-join to a virtual 2-row table:

SELECT  Id = CASE x.r WHEN 1 THEN o.LivingRoomId ELSE o.DiningRoomId END
FROM    dbo.Occupants AS o
CROSS
JOIN    (SELECT 1 UNION ALL SELECT 2) AS x (r)

However, as those columns allow NULLs, this method would not allow you to insert more than one single-reference row. To make it work in your case, you would need to filter out NULL entries, but only if they come from rows where the other reference is not NULL. I believe adding the following WHERE clause to the above query would suffice:

WHERE o.LivingRoomId IS NULL AND o.DinindRoomId IS NULL
   OR x.r = 1 AND o.LivingRoomId IS NOT NULL
   OR x.r = 2 AND o.DinindRoomId IS NOT NULL
like image 7
Andriy M Avatar answered Oct 08 '22 20:10

Andriy M