Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary datatype for Primary Keys Vs Int?

Are there performance (or other) issues using the Binary datatype for Primary Keys. The database has a large number of large tables that are regularly joined using these keys. The indexes are clustered. I believe that these can't be automatically incremented (as an Identity field).

like image 367
John Avatar asked Sep 09 '13 23:09

John


People also ask

Why an integer is better as a primary key?

The most frequently-used data types for primary keys are: Numeric (integer). This is the simplest to use and to auto-populate. It uses less space than other data types (usually 1-8 bytes), thus both saving storage and enhancing JOIN and LOOKUP.

What is the best data type for a primary key?

Integer (number) data types are the best choice for primary key, followed by fixed-length character data types. SQL Server processes number data type values faster than character data type values because it converts characters to ASCII equivalent values before processing, which is an extra step.

Should I use GUID or int for primary key?

int is smaller, faster, easy to remember, keeps a chronological sequence. And as for Guid , the only advantage I found is that it is unique. In which case a Guid would be better than and int and why? From what I've seen, int has no flaws except by the number limit, which in many cases are irrelevant.

Should I use primary key integer?

No, the primary key does not have to be an integer; it's just very common that it is. As an example, we have User ID's here that can have leading zeroes and so must be stored in a varchar field.


1 Answers

In SQL Server the Primary Key is by default also the key for the clustered index.

The Primary Key itself only needs to be unique and not nullable. There are no other restrictions.

The clustered index key however should be as short as possible. In most cases an ever increasing value is also preferred. The reason is that an index's depth is directly affected by the length of the index key. That is true for any index type. The clustered index key however gets automatically appended to each other index key on that table therefore multiplying the negative effect of a long key. That means in most cases an INT IDENTITY is a good choice.

If your Primary Key is non-clustered keeping it short is not that important. However, you are using it for joins. That means you probably have an index on this key on each child table too, therefore multiplying the problem again. So again, a automatically increasing surrogate key is probably the better choice.

This all is true for many if not most cases. However, there are always exceptions. You do not give a lot of information about your use case so the answer has to be general in nature. Make sure you test the performance of read as well as modification operations in your environment with realistic data before deciding which way to go.

As a final remark, a 4 byte BINARY and an INT are probably very close in performance. A difference you might see if the values are not created in a increasing binary-sorted way. That can cause page splits during insert operations and therefore impact your write performance.

like image 156
Sebastian Meine Avatar answered Oct 05 '22 00:10

Sebastian Meine