Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question about how foreign key data is stored in SQL

I know this is ultra-basic, but it's an assumption I've always held and would like to validate that it's true (in general, with the details specific to various implementations)

Let's say I have a table that has a text column "Fruit". In that column only one of four values ever appears: Pear, Apple, Banana, and Strawberry. I have a million rows.

Instead of repeating that data (on average) a quarter million times each, if I extract it into a another table that has a Fruit column and just those four rows, and then make the original column a foreign key, does it save space?

I assume that the four fruit names are stored only once, and that the million rows now have pointers or indexes or some kind of reference into the second table.

If my row values are longer than short fruit names I assume the savings/optimization is even larger.

like image 323
Dave Avatar asked Aug 24 '11 21:08

Dave


People also ask

Which type of table is most likely to contain foreign keys?

The table that contains the foreign key is considered the child table, and the table that the foreign key references is the parent table. The foreign key restricts what data can be stored in the foreign key columns in the child table, based on the data in the referenced columns in the parent table.

What problems do foreign keys introduce?

A foreign key might point to data that no longer exists, or the foreign key's data type doesn't match the primary key data type, eroding referential integrity. Referential integrity can also be corrupted if the foreign key doesn't reference all the data from the primary key.

How foreign keys help maintain database integrity?

A foreign key relationship allows you to declare that an index in one table is related to an index in another and allows you to place constraints on what may be done to the table containing the foreign key. The database enforces the rules of this relationship to maintain referential integrity.

How are foreign keys stored?

Foreign key references are stored within a child table and links up to a primary key in a separate table. The column acting as a foreign key must have a corresponding value in its linked table. This creates referential integrity.


3 Answers

The data types of the fields on both sides of a foreign key relationship have to be identical.

If the parent table's key field is (say) varchar(20), then the foreign key fields in the dependent table will also have to be varchar(20). Which means, yes, you'd have to have X million rows of 'Apple' and 'Pear' and 'Banana' repeating in each table which has a foreign key pointing back at the fruit table.

Generally it's more efficient to use numeric fields as keys (int, bigint), as those can have comparisons done with very few CPU instructions (generally a direct one cpu instruction comparison is possible). Strings, on the other hand, require loops and comparatively expensive setups. So yes, you'd be better off to store the fruit names in a table somewhere, and use their associated numeric ID fields as the foreign key.

Of course, you should benchmark both setups. These are just general rules of thumbs, and your specific requirements/setup may actually work faster with the strings-as-key version.

like image 160
Marc B Avatar answered Oct 17 '22 13:10

Marc B


That is correct.

You should have

table fruits
id   name
1    Pear
2    Apple
3    Banana
4    Strawberry

Where ID is a primary key. In your second table you will use just the id of this table. That will save you physical space and will make your select statements work faster.
Besides, this structure would make it very easy for you to add new fruits.

like image 6
Andrey Avatar answered Oct 17 '22 12:10

Andrey


Instead of repeating that data (on average) a quarter million times each, if I extract it into a another table that has a Fruit column and just those four rows, and then make the original column a foreign key, does it save space?

No if the "Fruit" is the PRIMARY KEY of the "lookup" table, so it must also be the FOREIGN KEY in the "large" table.

However if you make a small surrogate PRIMARY KEY (such as integer "id") in the "lookup" table and than use that as the FOREIGN KEY in the "large" table, you'll save space.

like image 4
Branko Dimitrijevic Avatar answered Oct 17 '22 11:10

Branko Dimitrijevic