Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will SQL Server 2005 penalize me for using an nvarchar(50) as a primary key, instead of an integer?

I'm considering altering some tables to use nvarchar(50) as primary key instead of an int primary key. Using an int ID for a key really is irrelevant data, it's the string I'm interested in. What sort of performance hit will occur, or where do you research this? Other than cut and try that is.

like image 864
P a u l Avatar asked Oct 22 '08 05:10

P a u l


People also ask

What is the maximum length of NVARCHAR in SQL Server?

nvarchar [ ( n | max ) ] n defines the string size in byte-pairs, and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^30-1 characters (2 GB). The storage size is two times n bytes + 2 bytes.

When would you use the NVARCHAR data type within a table?

You can use SQL varchar when the sizes of the column vary considerably, use varchar(max) when there are chances that string length might exceed 8000 bytes, use char when the sizes of the column are fixed and use nvarchar if there is a requirement to store Unicode or multilingual data.

Do primary keys have to be a single attribute?

In some instances, an entity will have more than one attribute that can serve as a primary key. Any key or minimum set of keys that could be a primary key is called a candidate key. Once candidate keys are identified, choose one, and only one, primary key for each entity.

Is primary key mandatory in SQL?

Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.


3 Answers

You have hit upon one of the major "holy wars" of database design. The debate you're referring to is the "surrogate vs. natural key" argument that's been raging for as long as there have been RDBMSs (as nearly as I can tell).

The debate essentially boils down to whether a representative key (surrogate, for example an IDENTITY column) should be used versus using the actual data that uniquely describes a record (natural key).

I will say that there is no "right" answer. Performance measures are an artifact of the platform, and should be assessed by experimentation, but performance is not likely to be the major concern.

What I consider to be the primary argument for surrogate keys is the immutability of primary keys. If you choose to use a natural key, you give up the option of altering that key after it is established. You also give up the possibility that it may become non-unique at some point in the future. For those reasons, I typically (not always) use surrogate keys for most of my tables.

However, as I mentioned, there is a very long-standing debate filled with discussions of indexing strategies and normal-form adherance to be read if you are so inclined.

I would Google "surrogate vs. natural keys". Here are a few links to get you started:

Systems Engineering and RDBMS

Techrepublic

Tony Rogerson's blog

Hope this helps.

like image 80
Jared Avatar answered Oct 13 '22 23:10

Jared


Consider using a surrogate key (an int primary key) as the primary key/clustered index key. The trouble with using a nvarchar(50) as a primary key/clustered index key is that your table will be ordered by that key which means it is likely to get highly fragmented, and that any other indexes will have the burden of referencing this heavy primary key.

Another issue is that presumably you are needing to JOIN on other tables by this type of value which is a more expensive operation as the size of the key grows.

I think there are very few situations where an nvarchar(50) primary key would make sense.

Generally, primary keys should be a surrogate UNLESS you have a small natural immutable key. Arguably, SSN, for example, could be considered a natural immutable key.

like image 5
Hafthor Avatar answered Oct 13 '22 22:10

Hafthor


For performance, I normally ask the following:

  • how many rows? 1,000 or 1,000,000 or 10,000,000 ??

  • what server is it sitting on? (memory, diskspace)

I would profile it and then see. Normally for me, the bottleneck is not the database, it's poorly written code, badly deployed etc. etc...

like image 1
Christian Payne Avatar answered Oct 13 '22 23:10

Christian Payne