Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't INT more efficient than UNIQUEIDENTIFIER (according to the execution plan)?

Tags:

sql-server

I have a parent table and child table where the columns that join them together are the UNIQUEIDENTIFIER type.

The child table has a clustered index on the column that joins it to the parent table (its PK, which is also clustered).

I have created a copy of both of these tables but changed the relationship columns to be INTs instead, have rebuilt the indexes so that they are essentially the same structure and can be queried in the same way.

When I query for a known 20 records from the parent table, pulling in all the related records from the child tables, I get identical query costs across both, i.e. 50/50 cost for the batches.

If this is true, then my giant project to change all of the tables like this appears to be pointless, other than speeding up inserts. Can anyone provide any light on the situation?


EDIT:

The question is not about which is more efficient, but why is the query execution plan showing both queries as having the same cost?

like image 743
cjk Avatar asked Mar 19 '10 09:03

cjk


People also ask

What is the difference between Int and GUID?

Considering that a GUID is in essence a 128 bit INT and a normal INT is 32 bit, the INT is a space saver (though this point is generally moot in most modern systems). In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

What is the purpose of the Uniqueidentifier data type?

The uniqueidentifier type is considered a character type for the purposes of conversion from a character expression, and therefore is subject to the truncation rules for converting to a character type.

Is it good to use GUID as primary key?

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

When should I use GUID?

A GUID is a "Globally Unique IDentifier". You use it anywhere that you need an identifier that guaranteed to be different than every other. GUIDs are generally used when you will be defining an ID that must be different from an ID that someone else (outside of your control) will be defining.


1 Answers

Seek-in a key in a clustered index is basically the same on a 4 bytes key, a 16 bytes key, or 160 bytes key. The cost of comparing the slots with the predicate is just noise in the overall cost of query (execution preparation, preparing execution context, opening the rowsets, locating the pages etc), even when no IO is involved.

While no one will argue that GUIDs and INT are on equal footing, comparing just 20 seeks will not reveal the differences. One thing you can measure immediately is space: a saving of 12 bytes per row and per non-leaf page on clustered index, plus 12 bytes on every leaf page on non-clustered indexes will add up over millions of rows and tens of tables and indexes. Less space means less IO, better memory cache performance, better goodness overall, and that can be measured, but you need to measure real loads, not a puny 20 rows seek.

Under lab conditions you will be able to measure the difference in raw speed between seeking an INT or a GUID, but that shouldn't be your focus. The argument of INT vs. GUID is not drivan by something like 5% performance gain in a seek, is driven by space savings and by guid randomness leading to fragmentation, both very easy to measure metrics that make a solid case for INT on their own grounds, no need to bring in the seek performance argument.

like image 107
Remus Rusanu Avatar answered Nov 15 '22 08:11

Remus Rusanu