I came across a comment on the following blogpost that recommends against using MEDIUMINT
:
Don’t use [the 24bit INT], even in MySQL. It’s dumb, and it’s slow, and the code that implements it is a crawling horror.
4294967295 and MySQL INT(20) Syntax Blows
An answer on Stack Overflow also notes that SQL Server, PostgreSQL and DB2 don't support MEDIUMINT
:
What is the difference between tinyint, smallint, mediumint, bigint and int in MySQL?
Should MEDIUMINT
be avoided or should I continue to use it in the cases where it best represents the data I am storing?
MEDIUMINT − A medium-sized integer that can be signed or unsigned. If signed, the allowable range is from -8388608 to 8388607. If unsigned, the allowable range is from 0 to 16777215. You can specify a width of up to 9 digits.
The number is used to display width. BIGINT takes 8 bytes i.e. 64 bits. The signed range is -9223372036854775808 to 9223372036854775807 and unsigned range takes positive value. The range of unsigned is 0 to 18446744073709551615.
In MySQL integer int(11) has size is 4 bytes which equals 32 bit. Signed value is : - 2^(32-1) to 0 to 2^(32-1)-1 = -2147483648 to 0 to 2147483647. Unsigned values is : 0 to 2^32-1 = 0 to 4294967295.
InnoDB stores MEDIUMINT as three bytes value. But when MySQL has to do any computation the three bytes MEDIUMINT is converted into eight bytes unsigned long int(I assume nobody runs MySQL on 32 bits nowadays).
There are pros and cons, but you understand that "It’s dumb, and it’s slow, and the code that implements it is a crawling horror" reasoning is not technical, right?
I would say MEDIUMINT makes sense when data size on disk is critical. I.e. when a table has so many records that even one byte difference (4 bytes INT vs 3 bytes MEDIUMINT) means a lot. It's rather a rare case, but possible.
mach_read_from_3 and mach_read_from_4 - primitives that InnoDB uses to read numbers from InnoDB records are similar. They both return ulint. I bet you won't notice a difference on any workload.
Just take a look at the code:
ulint
mach_read_from_3(
/*=============*/
const byte* b) /*!< in: pointer to 3 bytes */
{
ut_ad(b);
return( ((ulint)(b[0]) << 16)
| ((ulint)(b[1]) << 8)
| (ulint)(b[2])
);
}
Do you think it's much slower than this?
ulint
mach_read_from_4(
/*=============*/
const byte* b) /*!< in: pointer to four bytes */
{
ut_ad(b);
return( ((ulint)(b[0]) << 24)
| ((ulint)(b[1]) << 16)
| ((ulint)(b[2]) << 8)
| (ulint)(b[3])
);
}
In the grand scheme of things, fetching a row is the big cost. Simple functions, expressions, and much less, data formats, is insignificant in how long a query takes.
On the other side, if your dataset it too large to stay cached, the overhead of I/O to fetch row(s) is even more significant. A crude rule of thumb says that a non-cached row takes 10 times as long as a cached one. Hence, shrinking the dataset (such as using a smaller *INT
) may give you a huge performance benefit.
This argument apples to ...INT
, FLOAT
vs DOUBLE
, DECIMAL(m,n)
, DATETIME(n)
, etc. (A different discussion is needed for [VAR]CHAR/BINARY(...)
and TEXT/BLOB
.)
For those with a background in Assembly language...
Hence, the only sane way to write the code is to work at the byte level, and to ignore register size and assume all values are mis-aligned.
For Optimization, in order of importance:
Rule of Thumb: If a tentative optimization does not (via back-of-envelope calc) yield 10% improvement, don't waste your time on it. Instead look for some bigger improvement. For example, indexes and Summary tables are often provide 10x (not just 10%).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With