I have an online shop where users can have little shops with their own products. Each of this products can have questions associated to it and the owner of the shop has the ability to answer those questions. This information is stored in 3 tables a "Questions"(QuestionID,ProductID,...) table, a "Products"(ProductID,ShopID,...) table and a "Shop"(ShopID,OwnerID,...) table.
Is it better to have a ShopID in the 'Questions' table (to allow a shop owner to view all his questions) or to join those three tables to get Questions matching a certain Shop?
Data redundancy occurs when the same piece of data exists in multiple places, whereas data inconsistency is when the same data exists in different formats in multiple tables. Unfortunately, data redundancy can cause data inconsistency, which can provide a company with unreliable and/or meaningless information.
Redundant data is a bad idea because when you modify data (update/insert/delete), then you need to do it in more than one place. This opens up the possibility that the data becomes inconsistent across the database. The reason redundancy is sometimes necessary is for performance reasons.
Even though data redundancy can help minimize the chance of data loss, redundancy issues can affect larger data sets. For example, data that is stored in several places takes up valuable storage space and makes it difficult for the organization to identify which data they should access or update.
Using the SQL JOIN clause is necessary if you want to query multiple tables. Sooner or later, you'll have to use more than one table in a query. It's the nature of relational databases in general – they consist of data that's usually saved in multiple tables; in turn, these form a database.
It is almost always better to join and avoid redundant information. You should only denormalize when you must do so in order to meet a performance goal - and you can't know if you need to do this until you try with normalized tables first.
Note that denormalization helps in read performance at the expense of slowing down writes and making it easier for a coding mistake to cause data to be out of sync (since you're storing the same thing in more than one place you now have to be sure to update it all).
Generally it is better to avoid redundant information. This seems like it should be quite a cheap join to do given appropriate indexes and I wouldn't denormalise in that manner unless I saw in the query plans that the JOIN was causing problems (perhaps because of the number of records in the tables)
You would also need to consider the ratio of reads to writes. Denormalisation will help the reads but add overhead to writes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With