Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL UTF8 varchar column size

MySQL documentation says that since 5.0, varchar lengths refer to character units, not bytes. However, I recently came across an issue where I was getting truncated data warnings when inserting values that should have fit into the varchar column it was designated.

I replicated this issue with a simple table in v5.1

mysql> show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
  `string` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

I then inserted multiple 10 characters values with differing amounts of UTF8 characters

mysql> insert into test (string) values 
    -> ('abcdefghij'),
    -> ('ãáéíçãáéíç'),
    -> ('ãáéíç67890'),
    -> ('éíç4567890'),
    -> ('íç34567890');
Query OK, 5 rows affected, 4 warnings (0.06 sec)
Records: 5  Duplicates: 0  Warnings: 4

mysql> show warnings;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'string' at row 2 |
| Warning | 1265 | Data truncated for column 'string' at row 3 |
| Warning | 1265 | Data truncated for column 'string' at row 4 |
| Warning | 1265 | Data truncated for column 'string' at row 5 |
+---------+------+---------------------------------------------+

mysql> select * from test;
+------------+
| string     |
+------------+
| abcdefghij |
| ãáéíç |
| ãáéíç |
| éíç4567 |
| íç345678 |
+------------+
5 rows in set (0.00 sec)

I think that this shows that the varchar size is still defined in bytes or at least, is not accurate in character units.

The question is, am I understanding the documentation correctly and is this a bug? Or am I misinterpreting the documentation?

like image 548
sreimer Avatar asked May 01 '12 17:05

sreimer


People also ask

Is VARCHAR max 255?

VARCHAR(255) stores 255 characters, which may be more than 255 bytes.

What is the size of VARCHAR in MySQL?

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

What does VARCHAR 1024 mean?

VARCHAR(1024) 1024 this is lenght. The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. Follow this answer to receive notifications. answered May 18, 2020 at 10:55.

Does VARCHAR size matter MySQL?

With character data, no. Physically, there's no difference between storing or indexing a 40-byte string as varchar(128) or varchar(512). An ASCII value stored as nvarchar(100) will take twice the space as varchar(100), unless row-level compression is in use.


1 Answers

It's true that VARCHAR and CHAR sizes are considered in characters, not bytes.

I was able to recreate your issue when I set my connection character set to latin1 (single byte).

Ensure that you set your connection character set to UTF8 prior to running the insertion query with the following command:

SET NAMES utf8

If you don't do this, a two-byte UTF8 character will get sent as two single-byte characters.

You might consider changing your default client character set.

like image 117
Marcus Adams Avatar answered Sep 28 '22 09:09

Marcus Adams