Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composite primary key

I am working on the design of a database that will be used to store data that originates from a number of different sources. The instances I am storing are assigned unique IDs by the original sources. Each instance I store should contain information about the source it came from, along with the ID it was associated by this source.

As an example, consider the following table that illustrates the problem:

----------------------------------------------------------------
| source_id | id_on_source | data                              |
----------------------------------------------------------------
| 1         | 17600        | ...                               |
| 1         | 17601        | ...                               |
| 2         | 1            | ...                               |
| 3         | 1            | ...                               |
----------------------------------------------------------------

Note that while the id_on_source is unique for each source, it is possible for the same id_on_source to be found for different sources.

I have a decent understanding of relational databases, but am far from an expert or even an experienced user. The problem I face with this design is what I should use as primary key. The data seems to dictate the use of a composite primary key of (source_id, id_on_source). After a little googling I found some heated debates on the pros and cons of composite primary keys however, leaving me a little confused.

The table will have one-to-many relationship with other tables, and will thus be referred to in the foreign keys of other tables.

I am not tied to a specific RDBMS and I am not sure if it matters for the sake of the argument, but let's say that I prefer to work with SQLite and MySQL.

What are the pros and cons of using a composite foreign key in this case? Which would you prefer?

like image 616
TC. Avatar asked Sep 05 '09 11:09

TC.


4 Answers

Composite keys are tough to manage and slow to join. Since you're building a summary table, use a surrogate key (i.e.-an autoincrement/identity column). Leave your natural key columns there.

This has a lot of other benefits, too. Primarily, if you merge with a company and they have one of the same sources, but reused keys, you're going to get into trouble if you aren't using a surrogate key.

This is the widely acknowledged best practice in data warehousing (a much larger undertaking than what you're doing, but still relevant), and for good reason. Surrogates provide data integrity and quick joins. You can get burned very quickly with natural keys, so stay away from them as an identifier, and only use them on the import process.

like image 38
Eric Avatar answered Sep 30 '22 10:09

Eric


I personally find composite primary keys to be painful. For every table that you wish to join to your "sources" table you will need to add both the source_id and id_on_source field.

I would create a standard auto-incrementing primary key on your sources table and add a unique index on source_id and id_on_source columns.

This then allows you to add just the id of the sources table as a foreign key on other tables.

Generally I have also found support for composite primary keys within many frameworks and tooling products to be "patchy" at best and non-existent in others

like image 62
Steve Weet Avatar answered Sep 30 '22 09:09

Steve Weet


You have a business requirement that the combination of those two attributes are unique. So, you should have a UNIQUE constraint on those two attributes. Whether you call that UNIQUE constraint "primary" is really just a preference, it doesn't have much impact aside from documentation.

The only question is whether you then add an extra column and mark it UNIQUE. The only reason I can see to do that is performance, which is a legitimate reason.

Personally, I don't like the approach of turning every database into essentially a graph, where the generated columns are essentially pointers and you are just traversing from one to the next. I think that throws away all of the greatness of a relational system. If you step back and think about it, you're introducing a bunch of columns that have no meaning to your business, at all. You may be interested in my related blog post.

like image 31
Jeff Davis Avatar answered Sep 30 '22 10:09

Jeff Davis


I believe that composite keys create a very natural and descriptive data model. My experience comes from Oracle and I don't think there is any technical issues when creating a composite PK. In fact anyone analysing the data dictionary would immediately understand something about the table. In your case it would be obvious that each source_id must have unique id_on_source.

The use of natural keys often creates a hot debate, but people whom I work with like natural keys from a good data model perspective.

like image 30
softveda Avatar answered Sep 30 '22 08:09

softveda