MySQL documentation says that since 5.0, varchar lengths refer to character units, not bytes. However, I recently came across an issue where I was getting truncated data warnings when inserting values that should have fit into the varchar column it was designated.
I replicated this issue with a simple table in v5.1
mysql> show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`string` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
I then inserted multiple 10 characters values with differing amounts of UTF8 characters
mysql> insert into test (string) values
-> ('abcdefghij'),
-> ('ãáéíçãáéíç'),
-> ('ãáéíç67890'),
-> ('éíç4567890'),
-> ('íç34567890');
Query OK, 5 rows affected, 4 warnings (0.06 sec)
Records: 5 Duplicates: 0 Warnings: 4
mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'string' at row 2 |
| Warning | 1265 | Data truncated for column 'string' at row 3 |
| Warning | 1265 | Data truncated for column 'string' at row 4 |
| Warning | 1265 | Data truncated for column 'string' at row 5 |
+---------+------+---------------------------------------------+
mysql> select * from test;
+------------+
| string |
+------------+
| abcdefghij |
| ãáéíç |
| ãáéíç |
| éíç4567 |
| íç345678 |
+------------+
5 rows in set (0.00 sec)
I think that this shows that the varchar size is still defined in bytes or at least, is not accurate in character units.
The question is, am I understanding the documentation correctly and is this a bug? Or am I misinterpreting the documentation?
VARCHAR(255) stores 255 characters, which may be more than 255 bytes.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
VARCHAR(1024) 1024 this is lenght. The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. Follow this answer to receive notifications. answered May 18, 2020 at 10:55.
With character data, no. Physically, there's no difference between storing or indexing a 40-byte string as varchar(128) or varchar(512). An ASCII value stored as nvarchar(100) will take twice the space as varchar(100), unless row-level compression is in use.
It's true that VARCHAR and CHAR sizes are considered in characters, not bytes.
I was able to recreate your issue when I set my connection character set to latin1 (single byte).
Ensure that you set your connection character set to UTF8 prior to running the insertion query with the following command:
SET NAMES utf8
If you don't do this, a two-byte UTF8 character will get sent as two single-byte characters.
You might consider changing your default client character set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With