MySQL table design and normalization assistance

Tags:

Note: This question has been rephrased on 11/19/12 for clarification. I typically don't have much issue here but struggling designing a new product system for a client site. We offer a suite of products each client can sell to his customers. We may add new products at anytime but they all follow this format:

Category
Type
Product

To give a real world example using the structure from before:

Baseball Equipment
- Gloves
  - Rawlings
  - Nike
  - Mizzuno
- Bats
  - Easton
  - Louisville Slugger
Football Equipment
- Shoes
  - Nike
  - Reebok
  - Adidas
- Footballs
  - Nike
  - Saplding
  - Wilson

The list above clearly continues and can be much, much larger but it gives the overall idea.

Currently, I am storing the types of products particular clients can sell in a single flat format table as follows:

ID  | clientID | categoryID | typeID | productID | customURL
=============================================================
1   |  111     |    1       |   1    |   1       | 1111
2   |  111     |    1       |   2    |   2       | 2222
3   |  111     |    1       |   2    |   3       | 3333
4   |  111     |    2       |   3    |   4       | 4444
5   |  222     |    1       |   1    |   1       | 5555
6   |  222     |    2       |   3    |   4       | 6666

In the example above, category 1 can be "baseball equipment" and category 2 is "football equipment"
The names of the corresponding categoryID, typeID, and productID would be stored in 3 seaprate tables with FK relationships (innodb) so as to maintain normalization.
the type refers to the second level items (gloves, bats, shoes, footballs, etc). These numbers never intersect (meaning there can never be the same typeID even if the general product is the same (shoes in baseball has a separate id than shoes for football).
In this table, clientID 1 can sell 4 products, 3 in category 1 and 1 in category 2. ClientID 2 can sell 2 products, one in each category.

I am inclined to keep the table as structured but know in other design I may have separated the tables for normalization purposes I am not sure that apply here. If I broke them out, I would see this going from one table to 4 or more as follows:

productsOffered table

ID  | clientID | productID | customURL
======================================
1   |  111     | 1       | 1111
2   |  111     | 2       | 2222
3   |  111     | 3       | 3333
4   |  111     | 4       | 4444
5   |  222     | 1       | 5555
6   |  222     | 4       | 6666

productsDefinition Table

ID  | productID | typeID | productName
======================================
1   |  1        |    1   | rawlings glove
2   |  2        |    2   | product2
3   |  3        |    2   | product3
4   |  4        |    3   | product4

typeDefinition Table

ID  | typeID | categoryID | typeName
=====================================
1   |  1     |    1       | Gloves
2   |  2     |    1       | Bats
3   |  3     |    2       | Shoes
4   |  4     |    2       | Footballs

categoriesDefinition Table

ID  | categoryID | catName
=============================
1   |  1         | Baseball Equipment
2   |  2         | Football Equipment

Am I over thinking this? Don't both methods get the end solution the same way (the latter just involves several joins to gather the flat table as shown in figure 1)?

200

asked Nov 16 '12 17:11

JM4

2 Answers

The purpose and benefit of normalization is that it makes it harder (ideally, impossible) to enter anomalous data.

For example, in your figure 1, what's to prevent you from accidentally storing a row with typeid 3 and categoryid 1? Nothing, besides writing application code that is absolutely perfect.

But if you use your single-table approach, and you ever have to change the parent category of typeid 3, you'd have to change the data in a million places to reflect the change. This means locking the table while you perform that cleanup, or else new data could be inserted concurrently.

Normalization helps to eliminate storing information redundantly, and if every discrete fact (e.g. typeid 3 belongs to categoryid 2) is stored only once, then it's easy to make changes atomically, and which automatically change the meaning of all references to that row.

You're right that more joins are needed -- but only if you use pseudokeys all over the place like you're doing. You don't necessarily need to do that, you could use natural keys instead, and references to them would be declared with cascading foreign keys so a change in a lookup table automatically updates referencing tables too.

Certainly rules of normalization do not mandate using pseudokeys. These rules say nothing about them.

Re your comment: a pseudokey, or surrogate key, is the "id" column that's used to identify rows. Typically the values are allocated through an automatic incrementing mechanism that ensures uniqueness while allowing concurrent transactions to insert rows. The value of an id has no meaning with respect to the row it identifies.

Below shows what your tables would look like in normal form, but without surrogate keys.

productsOffered table

client | product        | customURL
===================================
Smith  | Rawlings Glove | 1111
Smith  | Product 2      | 2222
Smith  | Product 3      | 3333
Smith  | Product 4      | 4444
Jones  | Rawlings Glove | 5555
Jones  | Product 4      | 6666

productsDefinition Table

product        | type
=======================
Rawlings Glove | Gloves
Product 2      | Bats
Product 3      | Bats
Product 4      | Shoes

typeDefinition Table

type      | category
==============================
Gloves    | Baseball Equipment
Bats      | Baseball Equipment
Shoes     | Football Equipment
Footballs | Football Equipment

categoriesDefinition Table

category
==================
Baseball Equipment
Football Equipment

It's perfectly in keeping with relational database design and normalization to use non-integers as the data type for a primary key column, and therefore the foreign keys referencing them from other tables.

There are good reasons to use surrogate keys, for the sake of performance or brevity or allowing the values in other columns to change freely. But normalization does not mandate using surrogate keys.

answered Sep 20 '22 10:09

Bill Karwin

I would go for the normalised approach, as you have to maintain separate lookup tables for category and type names (and possibly other attributes) with the flat approach anyway.

You might consider changing the category and type into a general tree structure using a table such as:

 create table product_hierarchy(
    id integer primary key,
    name character,
    parent_id references product_hierarchy)

... as it would give the client the flexibility to add more depth to the hierarchy.

answered Sep 21 '22 10:09

David Aldridge

Related questions
                            
                                Command history in mysql client only showing last line of multiline query
                            
                                Is it bad to use enum('y','n') instead of a boolean field in a MySQL table?
                            
                                Connecting to MySQL through Cygwin
                            
                                Get parents recursive in right order for deletion from MySQL table
                            
                                Trouble using PDO for the first time
                            
                                Rename outfile with date in mysql
                            
                                Alias to Show Tables MySQL Result
                            
                                mysql query - insert data unix_timestamp ( now ( ) ) issue
                            
                                Apache Solr java tutorials [closed]
                            
                                JDBC get the relation+attribute referenced by a foreign key
                            
                                How to Efficiently use SQL to Retrieve Data on Half Hour Intervals?
                            
                                MySQL table is marked as crashed
                            
                                PHP generate dynamic PDO insert
                            
                                how can I use the mysqldump correctly [closed]
                            
                                MySQL debugging tool slow queries? [closed]
                            
                                Sum multiple rows
                            
                                mysqli_query, mysqli_fetch_array and while loop
                            
                                LEFT JOIN in MySQL using multiple ON conditions
                            
                                What are the differences in the following variable initialization styles in MySQL?
                            
                                MySQL GROUP BY "and filter"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL table design and normalization assistance

Tags:

mysql

database-design

normalization

productsOffered table

productsDefinition Table

typeDefinition Table

categoriesDefinition Table

JM4

People also ask

2 Answers

productsOffered table

productsDefinition Table

typeDefinition Table

categoriesDefinition Table

Bill Karwin

David Aldridge

Recent Activity

Donate For Us