Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database Design and the use of non-numeric Primary Keys

I'm currently in the process of designing the database tables for a customer & website management application. My question is in regards to the use of primary keys as functional parts of a table (and not assigning "ID" numbers to every table just because).

For example, here are four related tables from the database so far, one of which uses the traditional primary key number, the others which use unique names as the primary key:

--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
  `name` varchar(126) NOT NULL,
  `client_id` int(11) NOT NULL,
  `date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `notes` text NOT NULL,
  `website_status` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`),
  KEY `client_id` (`client_id`),
  KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
  `name` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');

--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
  `id` int(11) NOT NULL auto_increment,
  `date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `client_status` varchar(26) NOT NULL,
  `firstname` varchar(26) NOT NULL,
  `lastname` varchar(46) NOT NULL,
  `address` varchar(78) NOT NULL,
  `city` varchar(56) NOT NULL,
  `state` varchar(2) NOT NULL,
  `zip` int(11) NOT NULL,
  `country` varchar(3) NOT NULL,
  `phone` text NOT NULL,
  `email` varchar(78) NOT NULL,
  `notes` text NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `client_status` (`client_status`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
  `name` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');

As you can see, 3 of the 4 tables use their 'name' as the primary key. I know that these will always be unique. In 2 of the cases (the *_status tables) I am basically using a dynamic replacement for ENUM, since status options could change in the future, and for the 'website' table, I know that the 'name' of the website will always be unique.

I'm wondering if this is sound logic, getting rid of table ID's when I know the name is always going to be a unique identifier, or a recipe for disaster? I'm not a seasoned DBA so any feedback, critique, etc. would be extremely helpful.

Thanks for taking the time to read this!

like image 919
Nick Jennings Avatar asked May 29 '09 10:05

Nick Jennings


4 Answers

There are 2 reasons I would always add an ID number to a lookup / ENUM table:

  1. If you are referencing a single column table with the name then you may be better served by using a constraint
  2. What happens if you wanted to rename one of the client_status entries? e.g. if you wanted to change the name from 'affiliate' to 'affiliate user' you would need to update the client table which should not be necessary. The ID number serves as the reference and the name is the description.

In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.

EDIT: As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.

like image 55
Macros Avatar answered Nov 11 '22 02:11

Macros


When making natural PRIMARY KEY's, make sure their uniqueness is under your control.

If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's.

Since website_status and client_status seem to be generated and used by you and only by you, it's acceptable to use them as a PRIMARY KEY, though having a long key may impact performance.

website name seems be under control of the outer world, that's why I'd make it a plain field. What if they want to rename their website?

The counterexamples would be SSN and ZIP codes: it's not you who generates them and there is no guarantee that they won't be ever duplicated.

like image 27
Quassnoi Avatar answered Nov 11 '22 02:11

Quassnoi


Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:

  1. Unique (otherwise useless as a key)
  2. Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
  3. Static (you don't want to have to change all related records)
  4. Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)

Using "Name" as your key, while it seems to satisfy #1, doesn't satisfy ANY of the other three.

Even for your "lookup" table, what if your boss decides to change all affiliates to partners instead? You'll have to modify all rows in the database that use this value.

From a performance perspective, I'm probably most concerned that a key be narrow. If your website name is actually a long URL, then that could really bloat the size of any non-clustered indexes, and all tables that use it as a foreign key.

like image 9
BradC Avatar answered Nov 11 '22 02:11

BradC


Besides all the other excellent points that have already been made, I would add one more word of caution against using large fields as clustering keys in SQL Server (if you're not using SQL Server, then this probably doesn't apply to you).

I add this because in SQL Server, the primary key on a table by default also is the clustering key (you can change that, if you want to and know about it, but most of the cases, it's not done).

The clustering key that determines the physical ordering of the SQL Server table is also being added to every single non-clustered index on that table. If you have only a few hundred to a few thousand rows and one or two indices, that's not a big deal. But if you have really large tables with millions of rows, and potentially lots of indices to speed up the queries, this will indeed cause a lot of disk space and server memory to be wasted unnecessarily.

E.g. if your table has 10 million rows, 10 non-clustered indices, and your clustering key is 26 bytes instead of 4 (for an INT), then you're wasting 10 mio. by 10 by 22 bytes for a total of 2.2 billion bytes (or 2.2 GBytes approx.) - that's not peanuts anymore!

Again - this only applies to SQL Server, and only if you have really large tables with lots of non-clustered indices on them.

Marc

like image 3
marc_s Avatar answered Nov 11 '22 02:11

marc_s