Currently I'm using VARCHAR
/TEXT
with utf8_general_ci
for all character columns in mysql. Now I want to improve database layout/performance.
What I figured out so far is to better use
CHAR
instead of VARCHAR
for fixed length columns as GUIDs or session idsCHAR
for small columns having length of 1 or maybe 2?As I do not want to go as wide to save my GUIDs as BINARY(16)
because of handling issues, I'd rather save them as CHAR(32)
to especially improve keys. (I would even save 2/3 when switching from utf8 to some 1-byte-charset)
Is it good practice to mix up different character sets in same mysql (innodb) table? Or do I get better performance when all columns have same charset within same table? Or even database?
GUID/UUID/MD5/SHA1 are all hex and dash. For them
CHAR(..) CHARACTER SET ascii COLLATE ascii_general_ci
That will allow for A
=a
when comparing hex strings.
For Base64 things, use either of
CHAR(..) CHARACTER SET ascii COLLATE ascii_bin
BINARY(..)
since A
is not semantically the same as a
.
Further notes...
BINARY
may be a tiny bit faster than any _bin
collation, but not enough to notice.CHAR
for columns that are truly fixed length; don't mislead the user by using it for other cases.%_bin
is faster than %_general_ci
, which is faster than other collations. Again, you would be hard-pressed to measure a difference.TINYTEXT
or TINYBLOB
.utf8mb4
, use utf8mb4_unicode_520_ci
(or utf8mb4_900_ci
if using version 8.0). The 520 and 900 refer to Unicode standards; new collations are likely to come in the future.If you are entirely in Czech, then consider these charsets and collations. I list them in preferred order:
mysql> show collation like '%czech%';
+------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------+---------+-----+---------+----------+---------+
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | -- opens up the world
| utf8_czech_ci | utf8 | 202 | | Yes | 8 | -- opens up most of the world
| latin2_czech_cs | latin2 | 2 | | Yes | 4 | -- kinda like latin1
The rest are "useless":
| cp1250_czech_cs | cp1250 | 34 | | Yes | 2 |
| ucs2_czech_ci | ucs2 | 138 | | Yes | 8 |
| utf16_czech_ci | utf16 | 111 | | Yes | 8 |
| utf32_czech_ci | utf32 | 170 | | Yes | 8 |
+------------------+---------+-----+---------+----------+---------+
7 rows in set (0.00 sec)
More
ENUM
is 1 byte, yet acts like a string. So you get the "best of both worlds". (There are drawbacks, and there is a 'religious war' among advocates for ENUM
vs TINYINT
vs VARCHAR
.)country_code
is always 2 letters, always ascii, always could benefit from case insensitive collation. So CHAR(2) CHARACTER SET ascii COLLATE ascii_general_ci
is optimal. If you have something that is sometimes 1-char, sometimes 2, then flip a coin; whatever you do won't make much difference.VARCHAR
(up to 255) has an extra 1-byte length attached to it. So, if your strings vary in length at all, VARCHAR
is at least as good as CHAR
. So simplify your brain processing: "variable length --> `VARCHAR".BIT
, depending on version, may be implemented as a 1-byte TINYINT UNSIGNED
. If you have only a few bits in your table, it is not worth worrying about.CREATE TABLE
. I often see tables with BIGINT
and DOUBLE
(each 8 bytes) that could easily use smaller columns. Sometimes saving more than 50% (space).ENUM
ALTER TABLE
, though it can be "inplace".'unknown'
(or something like that) and making the column NOT NULL
(versus NULL
).A
versus a
).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With