Just as an update for Cliffs, Thanks ChaosPandion for the template.
Person
PersonID Int PK
Network
PersonID Int PK FK
OtherPersonID Int PK FK
OR
Person
PersonID Int PK
Network
PersonID Int PK FK
FriendID Int PK FK
Friend
FriendID Int PK
OtherPersonID Int FK
++++++ Original Post Below ++++++
Hi All,
I'm a web developer and have recently started a project with a company. Currently, I'm working with their DBA on getting the schema laid out for the site, and we've come to a disagreement regarding the design on a couple tables, and I'd like some opinions on the matter.
Basically, we are working on a site that will implement a "friends" network. All users of the site will be contained in a table tblUsers with (PersonID int identity PK, etc).
What I am wanting to do is to create a second table, tblNetwork, that will hold all of the relationships between users, with (NetworkID int identity PK, Owners_PersonID int FK, Friends_PersonID int FK, etc). Or conversely, remove the NetworkID, and have both the Owners_PersonID and Friends_PersonID shared as the Primary key.
This is where the DBA has his problem. Saying that "he would only implement this kind of architecture in a data warehousing schema, and not for a website, and this is just another example of web developers trying to take the easy way out."
Now obviously, his remark was a bit inflammatory, and that have helped motivate me to find an suitable answer, but more so, I'd just like to know how to do it right. I've been developing databases and programming for over 10 years, have worked with some top-notch minds, and have never heard this kind of argument.
What the DBA is wanting to do is instead of storing both the Owners_PersonId and Friends_PersonId in the same table, is to create a third table tblFriends to store the Friends_PersonId, and have the tblNetwork have (NetworkID int identity PK, Owner_PersonID int FK, FriendsID int FK(from TBLFriends)). All that tblFriends would house would be (FriendsID int identity PK, Friends_PersonID(related back to Persons)).
To me, creating the third table is just excessive in nature, and does nothing but create an alias for the Friends_PersonID, and cause me to have to add (what I view as unneeded) joins to all my queries, not to mention the extra cycles that will be necessary to perform the join on every query.
I understand that technically, what he is wanting is possible, but is it inline with best practice? What would be best practice?
Thanks for reading, appreciate comments.
Ryan
Yes, it is okay to have two fk to the same pk in one table.
FOREIGN KEY constraints can reference another column in the same table, and is referred to as a self-reference. A FOREIGN KEY constraint specified at the column level can list only one reference column. This column must have the same data type as the column on which the constraint is defined.
For example, the foreign key must reference a primary key or unique constraint, although that reference can be on the same table or on a different table.
A FOREIGN KEY constraint can only point to one table and each table can only have one PRIMARY KEY constraint. Or you can have multiple FOREIGN KEY constraints on the same column(s) referencing one PRIMARY KEY of a (different) table each.
If I understand you right, You're proposing:
Person PersonID PK
FriendList FriendListID, OwnerID, PersonID
The DBA proposes:
Person PersonID PK
FriendList FriendListID, OwnerID
FriendListEntry FriendListID, PersonID
Your approach would require multiple rows for each friend in the list. This would repeat OwnerID multiple times, violating normal form. The DBA's solution is more normalized, having only values that depend on FriendListID in the FriendList table.
The best practice here is to be good friends with the DBA. I'd go with his solution because it doesn't matter much, and you're sure to need him later on.
The only schema that makes sense to me is this:
Person
PersonID Int PK
Friend
PersonID Int PK FK
OtherPersonID Int PK FK
So you might have a procedure called FriendList
that executes this nice clean query:
Select Person.*
From Friend
Inner Join Person On Friend.OtherPersonID = Person.PersonID
Where Friend.PersonID = @PersonID;
I do not condone selecting all columns.
Your design violates Third Normal Form, if Network.Owners_PersonID
is stored redundantly for a network.
But I don't understand how the DBA's design actually helps. I would have expected Friends
to be the many-to-many table between Users
and Networks
:
CREATE TABLE tblUsers (
PersonID INT IDENTITY PRIMARY KEY
);
CREATE TABLE tblNetworks (
NetworkID INT IDENTITY PRIMARY KEY,
Owner_PersonID INT NOT NULL REFERENCES tblUsers
);
CREATE TABLE tblFriends (
NetworkID INT NOT NULL REFERENCES tblNetworks,
FriendID INT NOT NULL REFERENCES tblUsers,
PRIMARY KEY(NetworkID, FriendID)
);
In other words, you have a simple many-to-many relationship:
Users ----<- Friends ->---- Networks
And additionally, Networks
references Users
just to identify the owner of the given network. This way there's only one row for a given network, so you can't create an update anomaly by changing the owner of the network on some rows.
I don't think this is splitting the entities into separate tables excessively. You can still get a list of friends for a given network:
SELECT ... FROM Networks n JOIN Friends f ON (n.NetworkID=f.NetworkID)
You can get all of a user's friends from all networks this way (pass the given user's id for the ?
parameter):
SELECT ... FROM Friends u
JOIN Friends f ON (u.NetworkID=f.NetworkID)
WHERE u.UserID = ?
In your original design, it's pretty much the same:
SELECT ... FROM Networks u
JOIN Networks f ON (u.Owner_UserID=f.Owner_UserID)
WHERE u.FriendID = ?
But the advantage is that you've eliminated the possible update anomaly.
What I am wanting to do is to create a second table, tblNetwork, that will hold all of the relationships between users, with (NetworkID int identity PK, Owners_PersonID int FK, Friends_PersonID int FK, etc). Or conversely, remove the NetworkID, and have both the Owners_PersonID and Friends_PersonID shared as the Primary key.
I don't see any problem with this. And I agree that the NetworkID
is superfluous -- the two FKs are the natural key for the table, and so you should just use them as the primary key, unless you have some performance reason why you need to refer to specific relationships by a surrogate ID (which you don't seem to have in this case).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With