I'm trying to figure out the best way to insert a record into a single table but only if the item doesn't already exist. The KEY in this case is an NVARCHAR(400) field. For this example, lets pretend it's the name of a word in the Oxford English Dictionary / insert your fav dictionary here. Also, i'm guessing i will need to make the Word field a primary key. (the table will also have a unique identifier PK also).
So .. i might get these words that i need to add to the table...
eg.
So traditionally, i would try the following (pseudo code)
SELECT WordID FROM Words WHERE Word = @Word IF WordID IS NULL OR WordID <= 0 INSERT INTO Words VALUES (@Word)
ie. If the word doesn't exist, then insert it.
Now .. the problem i'm worried about is that we're getting LOTS of hits .. so is it possible that the word could be inserted from another process in between the SELECT and the INSERT .. which would then throw a constraint error? (ie. a Race Condition).
I then thought that i might be able to do the following ...
INSERT INTO Words (Word) SELECT @Word WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)
basically, insert a word when it doesn't exist.
Bad syntax aside, i'm not sure if this is bad or good because of how it locks down the table (if it does) and is not that performant on a table that it getting massive reads and plenty of writes.
So - what do you Sql gurus think / do?
I was hoping to have a simple insert and 'catch' that for any errors thrown.
On the Table Designer menu, select Indexes/Keys. In the Indexes/Keys dialog box, select Add. In the grid under General, select Type and choose Unique Key from the drop-down list box to the right of the property, and then select Close.
Use the INSERT IGNORE command rather than the INSERT command. If a record doesn't duplicate an existing record, then MySQL inserts it as usual. If the record is a duplicate, then the IGNORE keyword tells MySQL to discard it silently without generating an error.
The syntax for creating a unique constraint using an ALTER TABLE statement in SQL Server is: ALTER TABLE table_name ADD CONSTRAINT constraint_name UNIQUE (column1, column2, ...
Your solution:
INSERT INTO Words (Word) SELECT @Word WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)
...is about as good as it gets. You could simplify it to this:
INSERT INTO Words (Word) SELECT @Word WHERE NOT EXISTS (SELECT * FROM Words WHERE Word = @Word)
...because EXISTS doesn't actually need to return any records, so the query optimiser won't bother looking at which fields you asked for.
As you mention, however, this isn't particularly performant, because it'll lock the whole table during the INSERT. Except that, if you add a unique index (it doesn't need to be the primary key) to Word, then it'll only need to lock the relevant pages.
Your best option is to simulate the expected load and look at the performance with SQL Server Profiler. As with any other field, premature optimisation is a bad thing. Define acceptable performance metrics, and then measure before doing anything else.
If that's still not giving you adequate performance, then there's a bunch of techniques from the data warehousing field that could help.
I think I've found a better (or at least faster) answer to this. Create an index like:
CREATE UNIQUE NONCLUSTERED INDEX [IndexTableUniqueRows] ON [dbo].[table] ( [Col1] ASC, [Col2] ASC, )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = ON, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Include all the columns that define uniqueness. The important part is IGNORE_DUP_KEY = ON. That turns non unique inserts into warnings. SSIS ignores these warnings and you can still use fastload too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With