I'm considering altering some tables to use nvarchar(50) as primary key instead of an int primary key. Using an int ID for a key really is irrelevant data, it's the string I'm interested in. What sort of performance hit will occur, or where do you research this? Other than cut and try that is.
nvarchar [ ( n | max ) ] n defines the string size in byte-pairs, and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^30-1 characters (2 GB). The storage size is two times n bytes + 2 bytes.
You can use SQL varchar when the sizes of the column vary considerably, use varchar(max) when there are chances that string length might exceed 8000 bytes, use char when the sizes of the column are fixed and use nvarchar if there is a requirement to store Unicode or multilingual data.
In some instances, an entity will have more than one attribute that can serve as a primary key. Any key or minimum set of keys that could be a primary key is called a candidate key. Once candidate keys are identified, choose one, and only one, primary key for each entity.
Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.
You have hit upon one of the major "holy wars" of database design. The debate you're referring to is the "surrogate vs. natural key" argument that's been raging for as long as there have been RDBMSs (as nearly as I can tell).
The debate essentially boils down to whether a representative key (surrogate, for example an IDENTITY column) should be used versus using the actual data that uniquely describes a record (natural key).
I will say that there is no "right" answer. Performance measures are an artifact of the platform, and should be assessed by experimentation, but performance is not likely to be the major concern.
What I consider to be the primary argument for surrogate keys is the immutability of primary keys. If you choose to use a natural key, you give up the option of altering that key after it is established. You also give up the possibility that it may become non-unique at some point in the future. For those reasons, I typically (not always) use surrogate keys for most of my tables.
However, as I mentioned, there is a very long-standing debate filled with discussions of indexing strategies and normal-form adherance to be read if you are so inclined.
I would Google "surrogate vs. natural keys". Here are a few links to get you started:
Systems Engineering and RDBMS
Techrepublic
Tony Rogerson's blog
Hope this helps.
Consider using a surrogate key (an int primary key) as the primary key/clustered index key. The trouble with using a nvarchar(50) as a primary key/clustered index key is that your table will be ordered by that key which means it is likely to get highly fragmented, and that any other indexes will have the burden of referencing this heavy primary key.
Another issue is that presumably you are needing to JOIN on other tables by this type of value which is a more expensive operation as the size of the key grows.
I think there are very few situations where an nvarchar(50) primary key would make sense.
Generally, primary keys should be a surrogate UNLESS you have a small natural immutable key. Arguably, SSN, for example, could be considered a natural immutable key.
For performance, I normally ask the following:
how many rows? 1,000 or 1,000,000 or 10,000,000 ??
what server is it sitting on? (memory, diskspace)
I would profile it and then see. Normally for me, the bottleneck is not the database, it's poorly written code, badly deployed etc. etc...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With