Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shared Purpose tables in database design ( how would *you* implement a question/answer facility )

I have long considered the design of a database that involves shared table purposes to be somewhat a trait of smelly code, and progressively increasing proliferation of smelly-code related problems.

By this I mean, people over-normalizing, using 1 table where 2 tables could be more logical, people who've just discovered what normalizing is and overuse it to the point where they virtually layer a database on a database, or trying to use hierarchical data naively to solve all their problems.

The question is, when you have 2 sets of data, which appear to be the same, but they have a different purpose, do you use the same table to represent it? When do you know when it is a good idea and a bad idea to use this?

I've always been of the mind that needless self-referencing tables or structures that involved any table being used twice in the same query was a very dangerous predicament, both in design, and in long term performance and ease of future improvement.

That was of course, until I saw a thing or two in the RSS feeds for SO here. Now I'm not going onto a meta discussion about how SO works, but there appears to be an implicit design consideration here that challenged my thinking and want to glean a more cohesive answer on that style of logic.

You'll notice the generic question has the format:

 /question/1234/stuff-here-that-s-safe-to-leave-out

And I generally assumed that this implied that questions were numerically ordered somewhere.

But here is what stumped me: I'll take for example question 316210. If you look at the rss feed for this question, you will note there is a entry for the question, and a series of answer entries, which are functionally identical for the question entry except for a few minor differences. Now note in the answer entries also have link references, ... to questions, not the same question however, but to different questions, such as question 316218, which , when visited, redirects you back to the original question.

Now I'm not interested in how they implement that in the code, the problem is that you have here, questions and answers appear to be sharing the same table ( hence the sequential question ID's ) , and when users refer to an answer ID, you have to first query the database, and then go "hey!, oops!, that's not a question!" and then proceed to do a second query to find out the parent of that question ( in the same table ) and then redirect you to the actual question page, not to mention all the hullabaloo required with self joining queries ( which I've always considered filthy ) and conditionals all over the place to tune behavior.

Less Digressing, the real problem

The problem is, here you have 2 sets of data sharing the same table, and sure, this data is superficially similar, for now at least, but there looks like there is just so much technical debt involved.

The long term considerations involved with implementing new features that can apply to questions and not answers and vice versa, not to mention avoiding one being interpreted as another in some obscure corner. You can't add a new column for use in one application set without having to consider the resulting effects in another.

Sure, there is a minor benefit from using the singular table and that's when you are making a feature that is shared between facets you only have to code it once, but this could be just as easily represented by using an ancestor class of common methods, and child classes that bind to the specific tables for the difference cases. So at least that way, adding a new feature has no follow-on implications for the other scenario.

Now I've encountered this sort of problem in many places before, sure, but SO is the most easy example to point out.

When you implement your databases like this, do you share the table, or do you fork?

When, and why?

like image 819
Kent Fredric Avatar asked Nov 25 '08 03:11

Kent Fredric


2 Answers

I think that from the perspective of SO, both questions and responses are the same thing -- user posts. They just happen to be related. If a post has no parent, then it's a question. If a post does have a parent, then it's an answer. I find this perfectly reasonable though I'm not sure I would make the same choice since there are some significant differences. Perhaps these are stored in separate tables though.

I've basically done the same thing in one of my applications. I track events. An event is a "Master" event if it has no parent. If it has a parent, then it has to be a subevent of a "Master" event. They share many of the same base properties and so they share a table. "Master" events have some additional properties that are stored in a separate table. Generally when I'm selecting subevents, I already know the "Master" event and so a separate query is not needed.

like image 200
tvanfosson Avatar answered Oct 22 '22 14:10

tvanfosson


We have a similar thing with our student management system which stores applications and enrollments on the same table. An application is in essence a special case of enrollment on a course sort of not active yet. In my case it's probably best to use it this way as an application is easy to upgrade to an enrollment.

Really it depends how you are going to end up querying the data, I suspect a Q and A system is going to have a search facility so this is going to want to search questions and answers so it makes sense to have them in the same table. Also the questions and answers likely require the same extra data such as a date modified, who sent it... Just make sure you have a field in the index of the table defining if it is a question or an answer if you go this route.

It is worth forking when you are having to work around your solution, but remember just because you initially have it as one table it is relatively easy to run a script through separating it off later.

like image 40
PeteT Avatar answered Oct 22 '22 16:10

PeteT