When designing a lookup table (enum) in SqlServer 2005, if you know the number of entries will never get very high, should you use tinyint instead of int? I'm most concerned about performance, particularly efficiency of indexes.
Let's say you have these representative tables:
Person
------
PersonId int (PK)
PersonTypeId tinyint (FK to PersonTypes)
and
PersonTypes
-----------
PersonTypeId tinyint
PersonTypeName varchar(50)
The obvious factors are data size and coding hassle. When we get to 100 million rows in the person table, we're storing 300 million less bytes with tinyint as opposed to int, plus the space taken up by our indexes. Not a huge amount of data, but significant if the design decision is applied to dozens of big tables. The coding hassle, of course, comes from all those casting problems back in the ASP.NET C#/VB code.
If we set aside those two issues, what else comes into play? Will queries be much more efficient due to the decreased size of the index pages? Or is there some sort of padding that happens that will just negate the benefits? Any other gotchas?
I've always just used ints personally, but I'm considering tinyint for an upcoming redesign/migration effort on some huge tables, so I'd love to get some advice.
[Edit]
After experimenting with this, the coding hassles I anticipated turned out to be a non-issue. Changing from int to tinyint hasn't resulted in any casting problems at all.
If you include 2 or 3, you have to use tinyint (at the very smallest scale). "For example, it's better to use tinyint when you know that the only data you will store is a 1, 0 or null (with a very small chance of expanding that to a 2 or 3 later)." I'd use an ENUM for such a thing.
SMALLINT values have only 16 bits. They can represent whole numbers from –32,767 through 32,767. The INT and SMALLINT data types have the following advantages: They take up little space (2 bytes per value for SMALLINT and 4 bytes per value for INTEGER).
Putting all the lookup values into a single table is usually referred to as Common Lookup Tables, or Massively Unified Code-Key (MUCK), and is generally considered a design error. Great argumentation of why it's not a good idea can be found in the article below.
Lookups are an intuitive table linking syntax provided to simplify data integration and SQL queries. They represent foreign key relationships between tables, and once established, can be used to "expose" columns from the "target" of the lookup in the source table or query.
The narrower a table (or index node entry) is, the more records (or index nodes) can fit on a single IO page, and the fewer physical (and logical) reads IO operations are required for any query. Also, the more index nodes there are on a single page, the fewer levels there may be in the index, from root to leaf level, and if by making a table narrower you pass the threshold where the index can be one level smaller, this can have a dramatic effect on perforamnce.
If by switching to TinyInt you change your table from 200 bytes wide to 197 bytes wide, it probably won't make any difference... But if you change it from 20 bytes to 14, (say you have 2 ints in there), then it could be dramatic...
Memory 101: Smaller stuff means holding more in RAM at once and thus fewer hard disk reads. If the DB is big enough and you're running certain kinds of queries, this could be a very serious factor. But it probably won't make big difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With