I have several database tables and I need to assure that some columns together always are unique. I currently use a unique constraint like this:
ALTER TABLE [dbo].[MyTable]
ADD CONSTRAINT [AK_MyTable_Unique_Cols]
UNIQUE NONCLUSTERED ([Field_1] ASC, [Field_2] ASC,
[Field_3] ASC, [FkDeliveryId] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Where the table looks like this. Note that Sum is not part of constraint.
Field_1 | Field_2 | Field_3 | FkDeliveryId | Sum
Foo | Foo | Bar | 1 | 100
Foo | Bar | Bar | 1 | 900
Bar | Foo | Foo | 1 | 400
Bar | Foo | Bar | 2 | 800 // Not unique
Foo | Foo | Bar | 2 | 600
Bar | Foo | Bar | 2 | 300 // Not unique
But the problem is that the tables are dynamically created via C# and some tables will have more than 16 columns. So when I tried to create the constraint on a table with 52 columns I got this error:
The index '' on table 'dbo.MyTable' has 52 columns in the key list. The maximum limit for index key column list is 16. Could not create constraint or index. See previous errors.
So now I'm looking for another solution. My SQL knowledge is limited to querying the DB, not to constraints, so please be patient with me. :)
My question is: how can I assure that no row in my table is a duplicate (based on a selected number of columns)? Even with more than 16 columns?
The tables can have different number of columns, and the columns may be of different datatypes.
I've seen this question and like the hash-alternative. But will it work when I have 50+ columns and millions of rows?
Will the hash always be unique?
Update based on comments:
The tables are used to store data from files that are imported. I do not know the size of the files or how many columns they have. It is done in a scheduled job, so the performance issue regarding creating the table is not very important. The data must be persistent, however the constraint is really only needed to assure that for each insertion no row should be a duplicate. A column may in theory have varchar(max)
, which would cause the hash-column to get very big.
No the hash won't always be unique. Hash collisions occur.
This seems an odd requirement. Usually the key can be created on a subset of columns in the table.
A generic solution that gets around the 16 column limit would be to create a computed column that concatenates all the columns with some delimiter that is unlikely to occur in the data and then create a unique index on that. Which is basically the same as your linked question suggests.
But there is a 900 byte index key limit overall though. If you need to support arbitrary column lengths including varchar(max) that may well exceed this then you can't do this with declarative constraints and would need some procedural code. Best you can do is create a hash and a non unique index on that and then have your insert process check any hash duplicates to see if they are in fact real duplicates (could happen in a trigger to ensure it is always checked or the ETL process itself - which would likely be more efficient).
If you do it in a trigger it would help to add an identity column to the table. Then the code to identify duplicates would be.
SELECT *
FROM Inserted I
JOIN BaseTable B ON I.HashValue = B.HashValue AND I.Id<> B.Id
/* check remaining columns to see if actual differences exist in null safe way
http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx
*/
AND EXISTS (SELECT B.Col1, B.Col2
INTERSECT
SELECT I.Col1, I.Col2)
If the above returns any rows you have a violation and can rollback the transaction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With