Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize database or not? Read only MyISAM table, performance is the main priority (MySQL)

I'm importing data to a future database that will have one, static MyISAM table (will only be read from). I chose MyISAM because as far as I understand it's faster for my requirements (I'm not very experienced with MySQL / SQL at all).

That table will have various columns such as ID, Name, Gender, Phone, Status... and Country, City, Street columns. Now the question is, should I create tables (e.g Country: Country_ID, Country_Name) for the last 3 columns and refer to them in the main table by ID (normalize...[?]), or just store them as VARCHAR in the main table (having duplicates, obviously)?

My primary concern is speed - since the table won't be written into, data integrity is not a priority. The only actions will be selecting a specific row or searching for rows that much a certain criteria.

Would searching by the Country, City and/or Street columns (and possibly other columns in the same search) be faster if I simply use VARCHAR?

EDIT: The table has about 30 columns and about 10m rows.

like image 368
hello Avatar asked May 14 '10 22:05

hello


2 Answers

It can be faster to search if you normalize as the database will only have to compare an integer instead of a string. The table data will also be smaller which makes it faster to search as more can be loaded into memory at once.

If your tables are indexed correctly then it will be very fast either way - you probably won't notice a significant difference.

You might also want to look at a full text search if you find yourself writing LIKE '%foo%' as the latter won't be able to use an index and will result in a full table scan.

like image 89
Mark Byers Avatar answered Sep 19 '22 02:09

Mark Byers


I'll try to give you something more than the usual "It Depends" answer.

#1 - Everything is fast for small N - if you have less than 100,000 rows, just load it flat, index it as you need to and move on to something higher priority.

Keeping everything flat in one table is faster for reading everything (all columns), but to seek or search into it you usually need indexes, if your data is very large with redundant City and Country information, it might be better to have surrogate foreign keys into separate tables, but you can't really say hard and fast.

This is why some kind of data modeling principles are almost always used - either traditional normalized (e.g. Entity-Relationship) or dimensional (e.g. Kimball) is usually used - the rules or methodologies in both cases are designed to help you model the data without having to anticipate every use case. Obviously, knowing all the usage patterns will bias your data model towards supporting them - so a lot of aggregations and analysis is a strong indicator to use a denormalized dimensional model.

So it really depends a lot on your data profile (row width and row count) and usage patterns.

like image 34
Cade Roux Avatar answered Sep 22 '22 02:09

Cade Roux