Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One table or many? [closed]

I'm trying to design an application to hold academic reference information. The problem is that each different type of reference (eg. journal articles, books, newspaper articles etc) requires different information. For example a journal reference requires both a journal title and an article title, and also a page number, whereas a book requires a publisher and a publication date which journal articles do not require.

Therefore, should I have all the references stored in one table in my database and just leave fields blank when they don't apply, or should I have various tables such as BookReferences, JournalReferences, NewspaperReferences and put the appropriate references in each one. The problem then would be that it would make searching through all the references rather more difficult, and also editing would have to be done rather more separately probably.

(I'm planning to use Ruby on Rails for this project by the way, but I doubt that makes any difference to this design question)

Update:

Any more views on this? I hoped to get a simple answer saying that a particular method was definitely considered 'the best' - but as usual things aren't quite as simple as this. The Single-Table Inheritance option looks quite interesting, but there isn't much information on it that I can find very easily - I may post another question on this site about that.

I'm split between Olvak's answer and Corey's answer. Corey's answer gives a good reason why Olvak's isn't the best, but Olvak's answer gives good reasons why Corey's isn't the best! I never realised this could be so difficult...

Any further advice much appreciated!

like image 796
robintw Avatar asked Oct 14 '08 08:10

robintw


People also ask

Why is it better to have multiple separate tables?

In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update.

What is a many-to-many table called?

Junction table. When you need to establish a many-to-many relationship between two or more tables, the simplest way is to use a Junction Table. A Junction table in a database, also referred to as a Bridge table or Associative Table, bridges the tables together by referencing the primary keys of each data table.

What is the one table database?

Having a single-table design means you are putting all your data into the same table, and DynamoDB is not going to complain about that because it's schema-less key-value pair storage.

When data from more than one table in the database is required which is used?

Answer: B.Equijoin is one of the types of joins which is the most common and simple technique for joining more than one tables. Equijoins are also called simple joins or inner joins.


2 Answers

I'd go for having a single table for all references, but additional tables like BookReferences and so on for metadata not applicable for all reference types.

Searching and querying would not be more difficult - after all you could just create a view which aggregates all information as in the single-table solution, and then query that view further.

Having everything in one table with lots of nulls might seem like the simpler solution, but actually it will lead to lots of trouble. For example: With separate tables you can define which fields are required for every BookReference, but if everything is in one table, every field has to be nullable and therefore optional. It would also be easier to insert invalid data, like a book reference which also erroneously contains a non-null journal name.

Edit: Some people seem to fear joins. Don't fear the join! If you use the exact same join in several queries that would indeed be tedious, but in that case the join should be defined in a view, and you queries should query that view. Views are really the basic abstraction in relational databases, and you should use them for the same reasons you use functions in code: to avoid repetition, and to encapsulate and create abstractions.

Edit: There are some comments regarding performance. It's very hard to guess beforehand about performance of DB schemas, because it is often non-intuitive. For example a join between several tables can easily be faster than a full table scan of a single table - it all depends on the type of query, the nature of the data, the available indexes and so on. Additionally, in many database systems you can use features like materialized views to optimize performance for different queries without compromising the logical model. "Denormalization for performance" is mostly cargo cult these days IMHO, unless you are Google or Flickr.

like image 96
JacquesB Avatar answered Sep 19 '22 11:09

JacquesB


"life is easier with the one big table": I've seen the natural consequence of this, being a 100+ column table, and I can tell you I find this no joy to work with.

The main problem is that the designers of such tables tend to omit the constraints required to ensure data integrity. For example, the OP says:

a journal reference requires both a journal title and an article title, and also a page number, whereas a book requires a publisher and a publication date which journal articles do not require

...which implies the following constraints:

CONSTRAINT a_journal_must_have_a_journal_title    CHECK ( type <> 'journal' OR journal_title IS NOT NULL );  CONSTRAINT a_journal_must_have_an_article_title     CHECK ( type <> 'journal' OR article_title IS NOT NULL );  CONSTRAINT a_journal_must_have_a_page_number     CHECK ( type <> 'journal' OR page_number IS NOT NULL );  CONSTRAINT a_journal_cannot_have_a_publisher     CHECK ( type <> 'journal' OR publisher IS NULL );  CONSTRAINT a_journal_cannot_have_a_publication_date     CHECK ( type <> 'journal' OR publication_date IS NULL );  CONSTRAINT a_book_cannot_have_a_journal_title     CHECK ( type <> 'book' OR journal_title IS NULL );  CONSTRAINT a_book_cannot_have_a_article_title     CHECK ( type <> 'book' OR article_title IS NULL );  CONSTRAINT a_book_cannot_have_a_page_number     CHECK ( type <> 'book' OR page_number IS NULL );  CONSTRAINT a_book_must_have_a_publisher     CHECK ( type <> 'book' OR publisher IS NOT NULL );  CONSTRAINT a_jbook_must_have_a_publication_date     CHECK ( type <> 'book' OR publication_date IS NOT NULL ); 

...and I suspect that's only the tip of the iceberg!

It's my hope that after writing several hundred such constraints the designer may have second thoughts about all those nullable columns :)

like image 25
onedaywhen Avatar answered Sep 18 '22 11:09

onedaywhen