Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic size for data type in MySQL

I want to save a string in a column of table in MySQL. Sometimes it is 2 characters and sometimes very larger. 50000 characters. Is it a good idea to user varchar(50000). If I only save 2 characters in the column, does it use all 4998 bytes or just 2 bytes?

like image 216
Sponge Bob Avatar asked Nov 24 '13 11:11

Sponge Bob


People also ask

What data type we can use for a 50000 length string to store in a table?

A VARCHAR(50000) column storing a 2-character string would require L +2 bytes, where L is the number of bytes required to encode that 2-character string in the column's character set: it certainly will not use "all 4998 bytes".

What is dynamic column in MySQL?

Dynamic columns is a feature that allows one to store different sets of columns for each row in a table. It works by storing a set of columns in a blob and having a small set of functions to manipulate it.


1 Answers

As documented under Data Type Storage Requirements:

Storage Requirements for String Types

In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

╔═════════════════════════════╦═════════════════════════════════════════════════╗
║          Data TypeStorage Required                 ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ CHAR(M)                     ║ M × w bytes, 0 <= M <= 255, where w is the      ║
║                             ║ number of bytes required for the maximum-length ║
║                             ║ character in the character set                  ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ BINARY(M)                   ║ M bytes, 0 <= M <= 255                          ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ VARCHAR(M), VARBINARY(M)    ║ L + 1 bytes if column values require 0 – 255    ║
║                             ║ bytes, L + 2 bytes if values may require more   ║
║                             ║ than 255 bytes                                  ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ TINYBLOB, TINYTEXT          ║ L + 1 bytes, where L < 28                         ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ BLOB, TEXT                  ║ L + 2 bytes, where L < 216                        ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ MEDIUMBLOB, MEDIUMTEXT      ║ L + 3 bytes, where L < 224                        ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ LONGBLOB, LONGTEXT          ║ L + 4 bytes, where L < 232                        ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ ENUM('value1','value2',...) ║ 1 or 2 bytes, depending on the number of        ║
║                             ║ enumeration values (65,535 values maximum)      ║
╠═════════════════════════════╬═════════════════════════════════════════════════╣
║ SET('value1','value2',...)  ║ 1, 2, 3, 4 or 8 bytes, depending on the number  ║
║                             ║ of set members (64 members maximum)             ║
╚═════════════════════════════╩═════════════════════════════════════════════════╝

Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value.

To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 (or utf8mb4) Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three (four) bytes per character. For a breakdown of the storage used for different categories of utf8 or utf8mb4 characters, see Section 10.1.10, “Unicode Support”.

VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors:

  • The actual length of the column value

  • The column's maximum possible length

  • The character set used for the column, because some character sets contain multi-byte characters

For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

Therefore, in answer your question:

If I only save 2 characters in the column, does it use all 4998 bytes or just 2 bytes?

A VARCHAR(50000) column storing a 2-character string would require L+2 bytes, where L is the number of bytes required to encode that 2-character string in the column's character set: it certainly will not use "all 4998 bytes".

like image 94
eggyal Avatar answered Oct 17 '22 21:10

eggyal