I have a SQL server table RealEstate with columns - Id, Property, Property_Value. This table has about 5-10 million rows and can increase even more in the future. I want to insert a row only if a combination of Id, Property, Property_Value does not exist in this table.
Example Table -
1,Rooms,5 1,Bath,2 1,Address,New York 2,Rooms,2 2,Bath,1 2,Address,Miami
Inserting 2,Address,Miami
should NOT be allowed. But, 2,Price,2billion
is okay. I am curious to know which is the "best" way to do this and why. The why part is most important to me. The two ways of checking are -
Is there any scenario where one would be better than the other ?
Thanks.
PS: I know there is a similar question already, but it does not answer my problem - Unique constraint vs pre checking Also, I think that UNIQUE is applicable to all databases, so I don't think I should remove the mysql and oracle tags.
Without the unique constraint the optimizer has to interpolate from the statistics object created on that column, which is a perpetual source of execution plan problems. Of course the index created to enforce the unique constraint is a great source of performance improvement, too.
The Unique constraint is a column constraint used to ensure unique values in the column. It prevents duplicate values from appearing in a column for two or more rows.
PRIMARY KEY constraint differs from the UNIQUE constraint in that; you can create multiple UNIQUE constraints in a table, with the ability to define only one SQL PRIMARY KEY per each table. Another difference is that the UNIQUE constraint allows for one NULL value, but the PRIMARY KEY does not allow NULL values.
To check for a unique constraint use the already provided method: select count(*) cnt from user_constraints uc where uc. table_name='YOUR_TABLE_NAME' and uc.
I think it most cases the differences between that two are going to be small enough that the choice should mostly be driven by picking the implementation that ends up being most understandable to someone looking at the code for the first time.
However, I think exception handling has a few small advantages:
Exception handling avoids a potential race condition. The 'check, then insert' method might fail if another process inserts a record between your check and your insert. So, even if you're doing 'check then insert' you still want exception handling on the insert and if you're already doing exception handling anyways then you might as well do away with the initial check.
If your code is not a stored procedure and has to interact with the database via the network (i.e. the application and the db are not on the same box), then you want to avoid having two separate network calls (one for the check and the other for the insert) and doing it via exception handling provides a straightforward way of handling the whole thing with a single network call. Now, there are tons of ways to do the 'check then insert' method while still avoiding the second network call, but simply catching the exception is likely to be the simplest way to go about it.
On the other hand, exception handling requires a unique constraint (which is really a unique index), which comes with a performance tradeoff:
I'd also note that if you're in a situation where what you actually want to do is 'update else insert' (i.e. if a record with the unique value already exists then you want to update that record, else you insert a new record) then what you actually want to use is your particular database's UPSERT method, if it has one. For SQL Server and Oracle, this would be a MERGE statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With