I need to fit in additional data into a database, and I have a choice between modifying an existing table (table_existing) or creating new tables.
This is how table_existing looks like right now:
table_existing
-------------------------
| ID | SP | SV | Field1 |
| .. | WW | 1 | ...... |
| .. | WW | 1 | ...... |
-------------------------
Option (A)
table_existing
----------------------------------------------------------------------
| ID | SP | SV | Field1 | Field2 | Field3 | Field4 | Field5 | Field6 |
| .. | XX | 1 | ...... | ...... | ...... | ...... | ...... | ...... |
| .. | YY | 2 | ...... | ...... | ...... | ...... | ...... | ...... |
----------------------------------------------------------------------
Option (B)
table_existing would be converted into table_WW_1_data
---------------
| ID | Field1 |
| .. | ...... |
| .. | ...... |
---------------
table_XX_1_data
------------------------
| ID | Field1 | Field2 |
| .. | ...... | ...... |
| .. | ...... | ...... |
------------------------
table_YY_2_data
---------------------------------
| ID | Field1 | Field2 | Field3 |
| .. | ...... | ...... | ...... |
| .. | ...... | ...... | ...... |
---------------------------------
Context: The combination of SP, SV determine the "number" of fields that will be populated. For instance, (XX, 1) has 2 fields. (YY, 2) has 3 fields.
If I were to go with Option (A) I would have many empty/NULL values in the "wider" table.
If I go with Option (B), I am basically creating more tables... one for "each" combination of SP, SV - there will be perhaps 4-5 in total. But each would be fully populated with the right number of fields. table_existing would be changed as well.
What is the more optimal database structure from the speed point of view? I think that from the maintainability point of view, Option (B) might be better.
Edit1
Neither of the two Options will be the most critical / frequently used tables in my application.
In Option (B), after the data has been split up, there would be no need of JOINing them at all. If I know I need Fields for XX_1, I will go to that table.
I'm trying to understand if there are pros and cons for having ONE large table with many unused values vs having the same data split across more number of tables. Do the larger number of tables lead to a performance hit in the database (we've got ~80 tables already)?
What is the more optimal database structure from the speed point of view?
Well, what is correct, best practice, etc, is called Normalisation. If you do that correctly, there will be no optional columns (not fields), no Nulls. The optional columns will be in a separate table, with fewer rows. Sure, you can arrange the tables so that they are sets of optional columns, rather than (one PK plus) one column each.
Combining the rows from the sub-tables into one 5NF row is easy, do that i a view (but do not update via the view, do that directly to each sub-table, via a transactional stored proc).
More, smaller tables, are the nature of a Normalised Relational database. Get used to it. Fewer, larger tables are slower, due to lack of normalisation, duplicates and Nulls. Joining is cumbersome in SQL< but that is all we have. There is no cost in joins themselves, only it the tables being joined (rows, row width, join columns, datatypes, mismatches, indices [or not] ). Databases are optimised for Normalised tables, not for data heaps. And large numbers of tables.
Which happens to be optimal re performance, no surprise. For two reasons:
The tables are narrower, so there are more rows per page, you get more rows per physical I/O, and more rows in the same cache space.
Since you have No Nulls, those columns are fixed len, no unpacking to extract the contents of the column.
There are no pros for large tables with many optional (null) columns, only cons. There never is a pro for breaching standards.
The answer is unchanged regardless of whether you are contemplating 4 or 400 new tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With