Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VARCHAR as foreign key/primary key in database good or bad?

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.


My 2 cents:

From a performance perspective, using CHAR or VARCHAR as primary key or index is a nightmare.

I've tested compound primary keys (INT + CHAR, INT + VARCHAR, INT + INT) and by far INT + INT was the best performance (loading a data warehouse). Lets say about twice more performance if you keep only numeric primary keys/indexes.


When I'm doing design work I ask myself: have I got anything in this data that I can guarantee is going to be non-NULL, unique, and unchanging? If so that's a candidate to be the primary key. If not, I know I have to generate a key value to use. Assuming, then, that my candidate key happens to be a VARCHAR I then look at the data. Is it reasonably short in length (meaning, say, 20 characters or less)? Or is the VARCHAR field rather long? If it's short it's usable as a key - if it's long, perhaps it's better to not use it as a key (although if it's in consideration for being the primary key I'm probably going to have to index it anyways). At least part of my concern is that the primary key is going to have to be indexed and will perhaps be used as a foreign key from some other table. Comparisons of VARCHAR fields tend to be slower than the comparison of numeric fields (particularly binary numeric fields such as integers) so using a long VARCHAR field as a key may result in slow performance. YMMV.


with an int you can store up to 2 billion in 4 bytes with varchars you cannot you need to have 10 bytes or so to store that, if you use varchars there is also a 2 byte overhead

so now you add up the 6 extra bytes in every PK and FK + the 2 byte varchar overhead


If you make the category name into the ID you will have a problem if you ever decide to rename a category.


I would say it is fine to use VARCHAR as both PRIMARY and FOREIGN KEYS.

Only issue I could forsee is if you have a table, lets say Instruments (share instruments) and you create the PRIMARY/FOREIGN KEY as VARCHAR, and it happens that the CODE changes.

This does happen on Stock Exchanges, and would require you to rename all references to this CODE, where as a ID nr would not require this from you.

So to conclude, I would say this dependes on your intended use.

EDIT

When I say CODE, I mean the Ticker Code for lets say GOOG, or any other share. It is possible for these codes to change over time, lets say you look at Dirivative/Future instruments.