Choosing data type for MySQL?

Tags:

I have been researching and reading about SQL data types for a few days now (I know... I know, that's not very long) and one of the things that is hard for me to grasp is how to pick the best data type for expandability, efficiency, and ease of access.

I think it's pretty straight forward how to choose basic Data Types (i.e. int vs varchar) but how do you choose between things like blob and text type.

The MySQL man pages are great but they are not what we computer geeks love... efficient.

I think it would be awesome if we could compile a list of the MySQL data types, the general advantages/disadvantages of each and when it would make sense to choose that data type.

873

asked Apr 08 '11 00:04

austinbv

1 Answers

MySQL string types come in two variants: One without a character set label and one with a character set label.

A fixed length string, padded with spaces at the end, is CHAR(n). The matching type that has no character set label is BINARY(n). Storing the string "hello" in a CHAR(255) CHARSET utf8 will take 765 bytes (string padded with spaces to full length, stored as utf8 which as a worst case space usage of 3 bytes/character allocates 3*255 bytes).

A variable length string with one or two length bytes and no padding is VARCHAR((n). The matching type that has no character set label is VARBINARY(n). Storing the string "hello" in a VARCHAR(255) CHARSET utf8 will take 6 bytes (1 length byte plus 5 bytes for the actual text). Storing the string クリス in the same type will take 10 bytes (1 length byte plus 3 characters using 3 bytes per character to represent them).

mysql> select hex('クリス'), length(hex('クリス'))/2 as bytes;
+--------------------+--------+
| hex('クリス')      | bytes  |
+--------------------+--------+
| E382AFE383AAE382B9 | 9.0000 |
+--------------------+--------+
1 row in set (0.02 sec)

A variable length string with one, two, three or four length bytes is TINYTEXT, TEXT, MEDIUMTEXT and LARGETEXT. The matching types that have no character set label are TINYBLOB, BLOB, MEDIUMBLOB and LARGEBLOB.

A TEXT/BLOB-like type differs from a VARCHAR/VARBINARY-like type in how and where the data is stored, see http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for details on how TEXT/BLOB-like types are stored in InnoDB depending on version and ROW_FORMAT settings. For performance reasons, you want a recent version of InnoDB and "Barracuda"-Format tables.

MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.

For types without a character set label (BINARY, VARBINARY and %BLOB%), MySQL will accept the data as received and write it to disk. For types with a character set label, MySQL will look at what you announced as your clients character set to the server with SET NAMES and what is the columns defined character set label. It will then convert from the connection character set to the column character set and write the converted data. You can check that with the HEX() function, e.g. SELECT HEX(str) FROM t WHERE id = ....

On retrieval, the connections announced character set with SET NAMES can be different from what it had been at write time. MySQL again will check the columns character set label against the character set announced for this connection, and if necessary, will convert to the connection character set.

The performance penalty for that conversion either way is negligible compared to the time taken for the disk I/O incurred for such data anyway, to performancewise it hardly matters which type you choose. The rule is instead: Chose a type with a character set label, if you are working with textual data, and a type without if you are not.

A related question often asked: Shall I choose CHAR or VARCHAR (BINARY or VARBINARY respectively)?

For InnoDB, the answer is always: Chose the variable length data type. There is no performance benefit from fixed length data types in InnoDB, ever, but there is a huge size penalty if you are choosing a fixed length data type and then are not using all space in it. Plus fixed length SQL string types have really weird rules regarding padding and trimming with spaces at the end which you probably can't be bothered to learn. For MySQL the case can be different, but almost never is.

Another related question: Shall I choose VARCHAR or TEXT for my strings (VARBINARY or BLOB, respectively)?

The answer for that is using a recent version of InnoDB, Barracuda format tables and then TEXT/BLOB. The reason for that is explained at length in http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/. The upshot of that is: With either VARCHAR or TEXT/BLOB in pre-Barracuda format you run the risk of overflowing the InnoDB row size limitation if you have too many of them in a single row.

And finally: Shall I store files/images/other large blob or text data in the database?

The answer for that is: Usually not. Serving files from a database (http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html) is an expensive operation compared to serving files from a filesystem. If at all possible, you'd want to do that instead. There is a way around that, http://www.blobstreaming.org/, but that is advanced technology that requires that you have total control of your execution environment, which is never the case in a hosted environment.

To round this off: There are no variable length data types in MEMORY engine tables. So if you are seeing "using temporary" in an EXPLAIN output, this means

VARCHAR is converted to CHAR in that temporary table
VARBINARY is converted to BINARY

If the temporary table by this process becomes larger than tmp_table_size OR max_heap_table_size, it is on the fly converted to MyISAM format and goes to disk.

Example: You are defining a Ruby Active Record class User with contains ten fields tagged as :string. Each of these end up being VARCHAR(255) CHARSET utf8 in your Users table.

Elsewhere in your codebase, Users is being used in a way that involves a plan using temporary. You are instantly dying in disk operations under load, because each row of the Users table is now using 7650 bytes in MEMORY at least, most of them being spaces used as padding. This forces the temporary table to be converted to MyISAM and written to disk.

any %TEXT% or %BLOB% type cannot be represented in MEMORY, so the temporary table goes to disk as MyISAM even if it would have been sufficiently small to be kept in memory according to the limits stated above.

This implies that any query with a TEXT or BLOB type and a plan with "using temporary" needs to be rewritten in order to avoid temp tables hitting the disk.

122

answered Nov 08 '22 20:11

Isotopp

Related questions
                            
                                How wrong is it to have a unique and normal index on the same column?
                            
                                Maximum number of workable tables in SQL Server And MySQL
                            
                                Grouping timestamps in MySQL with PHP
                            
                                what are the main differences between all the mysql engines?
                            
                                Sync between Sql Server and Mysql Server
                            
                                Converting MyISAM to InnoDB. Beneficial? Consequences?
                            
                                Understanding keys in databases
                            
                                MySQL Find the total amount of posts per user
                            
                                How can SELECT UTC_TIMESTAMP() return -10:00 UTC?
                            
                                Grant permissions to a set of databases matching a pattern in MysQL 5.0
                            
                                How to Display result of subquery rows as one column in MySQL?
                            
                                Optimal MySQL temporary tables (memory tables) configuration?
                            
                                update with join statement mysql?
                            
                                Which best fits my needs: MongoDB, CouchDB, or MySQL. Criteria defined in question
                            
                                Searching MySQL Bin Log for a Query
                            
                                How to save other languages in mysql table?
                            
                                MySQL Select: WHERE (time now) = BETWEEN tablevalue AND tablevalue
                            
                                SQL Query where a field value does not contain empty spaces
                            
                                Finding max in mysql when column is varchar
                            
                                Default value for empty integer fields when importing CSV data in MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Choosing data type for MySQL?

Tags:

performance

sql

database

mysql

austinbv

People also ask

1 Answers

Isotopp

Recent Activity

Donate For Us