Let's say you have a database with a single table like...
---------------------------------------------
| Name | FavoriteFood |
---------------------------------------------
| Alice | Pizza |
| Mark | Sushi |
| Jack | Pizza |
---------------------------------------------
Would it be more space-efficient to have an additional table called "Strings" that stores strings, and change the FavoriteFood column to an index in the string table. In the above example, "Pizza" looks like it is stored twice, but with the additional table, it would appear to be stored only once. Of course, please assume there are 1,000,000 rows and 1,000 unique strings instead of just 3 rows and 2 unique strings.
Edit: We don't know what the FavoriteFoods are beforehand: they are user-supplied. The programmatic interface to the string table would be something like...
String GetString(int ID) { return String at with Row-ID == ID }
int GetID(String s) {
if s exists, return row-id;
else {
Create new row;
return new row id;
}
}
So the string-table seems more efficient, but do modern databases already do that in the background, so I can just do the simple one table approach and be efficient?
What are you measuring efficiency by? Assuming there is no other data associated with each FavoriteFood (in which case obviously you want two tables), a one-table approach is probably more time efficient, as the unnecessary join would incur an extra processing cost. On the other hand, a two-table approach may be more space-efficient, since it takes less space to store an index than a string, but that depends on how the particular database that you're using optimizes storage of repeated strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With