Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

in general, should every table in a database have an identity field to use as a PK?

This seems like a duplicate even as I ask it, but I searched and didn't find it. It seems like a good question for SO -- even though I'm sure I can find it on many blogs etc. out there. SO will have more of a debate than you can get on a blog.

I'm running into an issue with a join: getting back too many records. I think of this as "expansion". I added a table to the set of joins and the number of rows expanded, way too much. Usually when this happens I add a select of all the ID fields that are involved in the join. That way it's pretty obvious where the expansion is happening and I can change the ON of the join to fix it. Except in this case, the table that I added doesn't have an ID field. To me, this is a problem. But perhaps I'm wrong.

The question: should every table in a database have an IDENTITY field that's used as the PK? Are there any drawbacks to having an ID field in every table? What if you're reasonably sure this table will never be used in a PK/FK relationship?

Related, but not duplicate: When having an identity column is not a good idea?

Apparently this debate has been going on for a while. Shoulda known.

This post (surrogate vs. natural keys) is also relevant.

like image 917
jcollum Avatar asked Jul 30 '09 17:07

jcollum


People also ask

Should all tables have identity column?

No. There are cases when a database table should NOT have an IDENTITY field as a PRIMARY KEY . If your PRIMARY KEY is composite (like in many-to-many link tables) If your PRIMARY KEY is natural (like, a state code)

Is it necessary to have a primary key in every table?

Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.

Why you use ID field in your database?

The id field is an example of a surrogate key. It is a good idea to use a surrogate key as a primary key in a database because it is totally unrelated to and therefore unaffected by external events in the real world.

Can an identity column be used as the primary key of a table?

An identity column differs from a primary key in that its values are managed by the server and usually cannot be modified. In many cases an identity column is used as a primary key; however, this is not always the case.

Should every table have an identity field that's used as the PK?

The IDENTITY column should not necessarily be a PRIMARY KEY, but most often it's used to fill the surrogate PRIMARY KEY s It may or may not be useful in any particular case. Therefore, the answer to your question: The question: should every table in a database have an IDENTITY field that's used as the PK? No.

Should I have an identity field as a primary key?

There are cases when a database table should NOT have an IDENTITY field as a PRIMARY KEY. Three cases come into my mind when it's not the best idea to have an IDENTITY as a PRIMARY KEY: If your PRIMARY KEY is composite (like in many-to-many link tables) If your PRIMARY KEY should be unique across databases (in this case you use GUID / UUID / NEWID)

Does every table need an identity column?

Saying not every table needs an identity column, and then using the example you provided is just silly. You are right, but the way it was presented would be just as ludicrous as me saying. Every table MUST have an Identity column.

Should every table in a database have a primary key?

Show activity on this post. I'm creating a database table and I don't have a logical primary key assigned to it. Should each and every table have a primary key? Show activity on this post. Short answer: yes. If you want your table to be clustered, you need some kind of a primary key.


2 Answers

There are two concepts that are close but should not be confused: IDENTITY and PRIMARY KEY

Every table (except for the rare conditions) should have a PRIMARY KEY, that is a value or a set of values that uniquely identify a row.

See here for discussion why.

IDENTITY is a property of a column in SQL Server which means that the column will be filled automatically with incrementing values.

Due to the nature of this property, the values of this column are inherently UNIQUE.

However, no UNIQUE constraint or UNIQUE index is automatically created on IDENTITY column, and after issuing SET IDENTITY_INSERT ON it's possible to insert duplicate values into an IDENTITY column, unless it had been explicity UNIQUE constrained.

The IDENTITY column should not necessarily be a PRIMARY KEY, but most often it's used to fill the surrogate PRIMARY KEYs

It may or may not be useful in any particular case.

Therefore, the answer to your question:

The question: should every table in a database have an IDENTITY field that's used as the PK?

is this:

No. There are cases when a database table should NOT have an IDENTITY field as a PRIMARY KEY.

Three cases come into my mind when it's not the best idea to have an IDENTITY as a PRIMARY KEY:

  • If your PRIMARY KEY is composite (like in many-to-many link tables)
  • If your PRIMARY KEY is natural (like, a state code)
  • If your PRIMARY KEY should be unique across databases (in this case you use GUID / UUID / NEWID)

All these cases imply the following condition:

You shouldn't have IDENTITY when you care for the values of your PRIMARY KEY and explicitly insert them into your table.

Update:

Many-to-many link tables should have the pair of id's to the table they link as the composite key.

It's a natural composite key which you already have to use (and make UNIQUE), so there is no point to generate a surrogate key for this.

I don't see why would you want to reference a many-to-many link table from any other table except the tables they link, but let's assume you have such a need.

In this case, you just reference the link table by the composite key.

This query:

CREATE TABLE a (id, data) CREATE TABLE b (id, data) CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id)) CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)  SELECT  * FROM    business_rule br JOIN    a ON      a.id = br.a_id 

is much more efficient than this one:

CREATE TABLE a (id, data) CREATE TABLE b (id, data) CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id)) CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)  SELECT  * FROM    business_rule br JOIN    a_to_b ab ON      br.ab_id = ab.id JOIN    a ON      a.id = ab.a_id 

, for obvious reasons.

like image 123
Quassnoi Avatar answered Sep 20 '22 17:09

Quassnoi


Almost always yes. I generally default to including an identity field unless there's a compelling reason not to. I rarely encounter such reasons, and the cost of the identity field is minimal, so generally I include.

Only thing I can think of off the top of my head where I didn't was a highly specialized database that was being used more as a datastore than a relational database where the DBMS was being used for nearly every feature except significant relational modelling. (It was a high volume, high turnover data buffer thing.)

like image 22
Greg D Avatar answered Sep 20 '22 17:09

Greg D