According to the mysql documentation:
A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes
AND
The maximum row size constrains the number of columns because the total width of all columns cannot exceed this size. For example, utf8 characters require up to three bytes per character, so for a CHAR(255) CHARACTER SET utf8 column, the server must allocate 255 × 3 = 765 bytes per value. Consequently, a table cannot contain more than 65,535 / 765 = 85 such columns.
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
From the MySQL documentation:
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
A [VARCHAR] column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
This makes the answer to your question depend on the character encoding.
With a single-byte encoding like windows-1252 (which MySQL calls latin1), the character length is the same as the byte length, so you can use a VARCHAR(255)
.
With UTF-8, a VARCHAR(N) may require up to 3N bytes, as would be the case if all characters were in the range U+0800 to U+FFFF. Thus, a VARCHAR(85)
is the greatest that ensures a single-byte byte length (requiring a maximum of 255 bytes).
(Note that MySQL apparently does not support characters outside the BMP. The official definition of UTF-8 allows 4 bytes per character.)
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
This depends on the collation of the VARCHAR
column.
As you noted, UTF8
may use up to three bytes per character, so if your declare a UTF8
column more than 85
characters long, there is a chance that it will use more than 255
bytes to store its data, and the length hence should be stored in a two-byte field.
If you use latin1
, each character is stored in 1
byte.
So the answer is:
VARCHAR(85) COLLATE UTF8_GENERAL_CI
, or
VARCHAR(255) COLLATE LATIN1_GENERAL_CI
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With