Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I make an identity field span multiple tables in SQL Server?

Can I have an "identity" (unique, non-repeating) column span multiple tables? For example, let's say I have two tables: Books and Authors.

Authors
  AuthorID
  AuthorName
Books
  BookID
  BookTitle

The BookID column and the AuthorID column are identity columns. I want the identity part to span both columns. So, if there is an AuthorID with a value of 123, then there cannot be a BookID with a value of 123. And vice versa.

I hope that makes sense.

Is this possible?

Thanks.

Why do I want to do this? I am writing an APS.NET MVC app. I am creating a comment section. Authors can have comments. Books can have comments. I want to be able to pass an entity ID (a book ID or an author ID) to an action and have the action pull up all the corresponding comments. The action won't care if it's a book or an author or whatever. Sound reasonable?

like image 905
johnnycakes Avatar asked Mar 11 '10 20:03

johnnycakes


3 Answers

Even if you could put the identity sequence across multiple tables, your comment table is not going to be able to reference both columns in a single foreign key.

The best way to do this, in terms of relational database design theory, would be to create two comment tables. But obviously, you want to avoid that, probably for code-reuse reasons.

The most straightforward pragmatic approach would be to put two foreign key columns on the comment table, and just make one null and the other not null for each comment.

Another approach, which might be the best compromise, is this. You refer in your question to an "entity ID". So make an Entity table! Then the authors and books and comments can all refer to that table.

Edited to add:

Philip Kelley, Ray, and (I think) Artic have all suggested modifying the comment table by adding an entity_id, which can refer to either the book_id or the author_id, and a flag of some sort (char(1), tinyint, and boolean, respectively) that indicates which of these is being referred to.

This is not a good solution for many reasons, both pragmatic (including data integrity, reporting, efficiency) and theoretical.

The first and most obvious problem is the data integrity problem. A relational database system should always be responsible for maintaining the integrity of its own data, and there are natural and preferred ways that the DB is designed to do this. One of the most important of these mechanisms is the foreign key system. If the comment.entity_id column is to reference both book.book_id and author.author_id, then a foreign key cannot be created for this column.

Sure, you could put a check in your DML (insert, update, delete) stored procedures to verify the references, but that would quickly turn into a big mess, as all DML operations on all three tables would be involved.

And that leads us to the efficiency problem. Whenever a query is run against the comment table, it will require joins to either the author or book table or both. The query plan generation system will not have foreign keys available to optimize with, so its performance could very well be degraded.

Then there are problems with this scheme in reporting. Any report generating system is going to have trouble with this sort of system. Sure this won't be a problem for expert programmers, but any user ad-hoc reports are going to have to mock up the logic behind when the event_id means this or that, and it could be a pretty bad deal. Maybe you won't ever use report generating tools on this database. But then again, nobody knows where a database is going to be ultimately used. Why not work with the system to allow for anything?

And that leads us to the theoretical problems.

In relational database theory, each row (a.k.a. "tuple") in each table ("relation variable") represents a proposition about the real world. Designing a table is to decide the form of that proposition. Let's look at a few examples of how this might work.

comment (comment_id int, comment_type char(1), entity_id int, 
         user_id int, comment_text nvarchar(max), comment_date datetime)
/* comment_id identifies a comment (comment_text) that a user (user_id) 
   has made about a book (entity_id if comment_type = 'B') or author 
   (entity_id if comment_type = 'A') at a particular date and 
   time (comment_date).*/

Here it is clear that the column (or "attribute") called entity_id is doing double-duty. It doesn't really represent anything, except with reference to another column. This is workable, but unsatisfactory.

comment (comment_id int, book_id int, author_id int, user_id int, 
         comment_text nvarchar(max), comment_date datetime)
/* comment_id identifies a comment (comment_text) that a user (user_id) 
   has made about a book (book_id if not null) or author (author_id if 
   not null) at a particular date and time (comment_date). */

This buys us the foreign keys that are the biggest omission from the first version. But this still isn't terribly satisfactory, unless a single comment can refer to both a book and an author (which might be reasonable). Nullable columns are a warning sign that something is wrong with the design, and that may be the case here as well. A check constraint may be necessary to avoid a comment that refers to nothing at all, or to both a book and an author if that is not to be allowed.

From a theoretical perspective (and thus, my perspective :)) there is a clear best option:

book_comment (book_comment_id int, book_id int, user_id int, 
              comment_text nvarchar(max), comment_date datetime)
/* book_comment_id identifies a comment (comment_text) that a 
   user (user_id) has made about a book (book_id) at a particular 
   date and time (comment_date). */

author_comment (author_comment_id int, author_id int, user_id int, 
                comment_text nvarchar(max), comment_date datetime)
/* author_comment_id identifies a comment (comment_text) that a 
   user (user_id) has made about an author (author_id) at a particular 
   date and time (comment_date). */

This last option would provide the best efficiency, data integrity, and ease of reporting. And the only expense would be that the DML stored procedures would need to put the comments into the right tables, which is not a big deal, since they had to know what the comments were referring to anyway.

If your plan was to retrive all comments for a book or author at once, then you can easily create a view on top of these tables that reproduces the other designs, if that's what you want to do.

create view comments as 
select 
    book_comment_id as comment_id, 
    book_id as entity_id, 
    comment_text,
    'B' as comment_type
from book_comment
union
select 
    author_comment_id as comment_id, 
    author_id as entity_id, 
    comment_text,
     'A' as comment_type 
from author_comment
like image 123
Jeffrey L Whitledge Avatar answered Sep 27 '22 23:09

Jeffrey L Whitledge


Actually, Joe Celko suggests on this blog to use a custom sequence in your database, and then, for any primary key of your desired tables, specify their default values to get the next number from your custom sequence.

Here is a code sample from his blog:

CREATE SEQUENCE Service_Ticket_Seq
 AS INTEGER
 START WITH 1
 INCREMENT BY 1
 MINVALUE 1
 MAXVALUE 100
 CYCLE;

CREATE TABLE Meats
(ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq
       PRIMARY KEY,
 meat_type VARCHAR(15) NOT NULL);

CREATE TABLE Fish
(ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq
       PRIMARY KEY,
 fish_type VARCHAR(15) NOT NULL);

INSERT INTO Meats (meat_type) VALUES ('pig');
INSERT INTO Fish (fish_type) VALUES ('squid');

select * from Meats

select * from Fish

This being said, an identity field spanning multiple tables is possible in MS SQL.

like image 31
Daniel Cristian Avatar answered Sep 28 '22 01:09

Daniel Cristian


The short answer is: No, you can't do that (at least in MS SQL Server through 2008).

You could make a new table, "CommentableEntity", plug your identity column in there, then define foreign keys in Authors and Books to reference it as a parent table, and then do one of a number of tricks to ensure that a given ID value is not assigned to both tables... but this is a poor idea, because the data model you built would imply that Authors and Books are related kinds of data, and they really aren't.

You could have a separate table, Comments, have the identity column in there, and park a CommentId column in both Authors and Books. However, that would limit each book and author to only one comment.

Me, I'd probably add a column like "CommentorType" to the Comments table and a put a flag in there indicating source of comment ("A" for author, "B" for book). Build a primary key on "CommentorId + CommentorType", and it should work well enough -- and it'd be trivial to add further types of commentors as the system expands.

like image 24
Philip Kelley Avatar answered Sep 27 '22 23:09

Philip Kelley